Skip to content

Choosing the Right File System for kdb+: A Case Study with KX Nano

The performance of a kdb+ system is critically dependent on the throughput and latency of its underlying storage. In a Linux environment, the file system is the foundational layer that enables data management on a given storage partition.

This paper presents a comparative performance analysis of various file systems using the KX Nano benchmarking utility. The evaluation was conducted across two distinct test environments, each with different operating systems and storage bandwidth (6500 vs 14000 MB/s) and IOPS (700K vs 2500K).

File systems tested:

  1. ext4 (rev 1)
  2. XFS (V5)
  3. Btrfs (v6.6.3, compression off)
  4. F2FS (v1.16.0, compression off)
  5. ZFS (c2.2.2, compression off)

Summary

No single file system demonstrated superior performance across all tested metrics; the optimal choice depends on the primary workload characteristics. The optimal choice depends on the specific operations you need to accelerate. Furthermore, the host operating system (e.g., Red Hat Enterprise Linux vs. Ubuntu) constrains the set of available and supported file systems.

Our key recommendations are as follows:

  • For write-intensive workloads where data ingestion rate is the primary driver, XFS is the recommended file system.

    • XFS consistently demonstrated the highest write throughput, particularly under concurrent write scenarios. For instance, a kdb+ set operation on a large float vector (31 million elements) executed 5.6x faster on XFS than on ext4 and nearly 70x faster than on ZFS.
    • This superior write performance translates to significant speedups in other I/O-heavy operations. Parallel disk sorting was 3.1x faster, and applying the p# (parted) attribute was 6.9x faster on XFS compared to ext4. Consequently, workloads like end-of-day (EOD) data processing will achieve the best performance with XFS.
  • For read-intensive workloads where query latency is paramount, the choice is nuanced:

    • On Red Hat Enterprise Linux 9, ext4 holds a slight advantage for queries dominated by sequential reads. For random reads, its performance was comparable to XFS.
    • On Ubuntu, F2FS demonstrated a performance margin in random read operations. However, this advantage shifted decisively to XFS when the data was already resident in the operating system's page cache.

kdb+ also supports tiering. For tiered data architectures (e.g., hot, mid, cold tiers), a hybrid approach is advisable.

  • Hot tier: Data is frequently queried and often resides in the page cache. For this tier, a read-optimized file system like ext4 or XFS is effective.
  • Mid Tier: Data is queried less often, meaning reads are more likely to come directly from storage. In this scenario, F2FS's stronger random read performance from storage provides some advantage.
  • Cold Tier: Data is typically compressed and stored on high-latency, cost-effective media like HDDs or object storage. While kdb+ has built-in compression support, file systems like Btrfs, F2FS, and ZFS also offer this feature. The performance implications of file-system-level compression warrant a separate, dedicated study.

Disclaimer: These guidelines are specific to the tested hardware and workloads. We strongly encourage readers to perform their own benchmarks that reflect their specific application profiles. To facilitate this, the benchmarking suite used in this study is included with the KX Nano codebase, available on GitHub.

Details

All benchmarks were executed in September 2025 using kdb+ 4.1 (2025.04.28) and KX Nano 6.4.5. Each kdb+ process was configured to use 8 worker threads (-s 8).

We used the default vector length of KX Nano, which are:

  * tiny: 2047
  * small: 63k
  * medium: 127k
  * large: 31m
  * huge: 1000m

Test 1: Red Hat Enterprise Linux 9 with Intel NVMe SSD (PCIe 4.0)

This first test configuration utilized an Intel NVMe SSD on a server running Red Hat Enterprise Linux (RHEL) 9.3. Following RHEL 9's official supported file systems, the comparison was limited to ext4 and XFS.

Test Setup

Component Specification
Storage * Type: 3.84 TB Intel SSD D7-P5510
* Interface: PCIe 4.0 x4, NVMe
* Sequential R/W: 6500 MB/s / 3400 MB/s
* Random Read: 700K IOPS (4K)
* Latency: Random Read: 82 µs (4K), Sequential Read / Write: 10 µs / 13 µs (4K)
CPU Intel(R) Xeon(R) 6747P (2 sockets, 48 cores per socket, 2 threads per core)
Memory 502GiB, DDR5 @ 6400 MT/s
OS RHEL 9.3 (kernel 5.14.0-362.8.1.el9_3.x86_64)

The values presented in the result tables represent throughput in MB/s, where higher figures indicate better performance. The "Ratio" column quantifies the performance of XFS relative to ext4 (e.g., a value of 2 indicates XFS was twice as fast).

Write

We split the write results into two tables. The first table contains the "high-impact" tests and should be considered with more weight. These test are related to EOD (write, sort, applying attribute) and EOI (append) works that is often the bottleneck of ingestion.

Single kdb+ process:

    XFS (MB/s) ext4 (MB/s) Ratio
Test Type Test      
read mem write disk add attribute 259 231 1.12
read write disk disk sort 105 97 1.09
write disk open append mid float, sync once 1038 870 1.19
open append mid sym, sync once 932 841 1.11
write float large 2170 1304 1.66
write int huge 3338 2157 1.55
write int medium 3070 1999 1.54
write int small 910 1119 0.81
write int tiny 100 50 2.01
write sym large 1480 1289 1.15
GEOMETRIC MEAN 776 605 1.28
MAX RATIO 3338 2157 2.01

Observation: XFS is almost always faster than ext4. In critical tests, the advantage is almost 28% on average with a maximal difference 101%.

The performance of the less critical write operations is below. The Linux sync command synchronizes cached data to permanent storage. This data includes modified superblocks, modified inodes, delayed reads and writes, and others. EOD and EOI solutions often use sync operations to improve resiliency by ensuring data is persisted to storage and not held temporarily in caches. The sync operation is typically much faster than the set command because Linux executes it behind the scenes (compare the speed of write float large and sync float large). The throughput for sync operation is not always helpful because sync does not necessarily need to handle the entire vector.

    XFS (MB/s) ext4 (MB/s) Ratio
Test Type Test      
write disk append small, sync once 753 484 1.55
append tiny, sync once 549 368 1.49
open append small, sync once 937 812 1.15
open append tiny, sync once 200 210 0.96
open replace int tiny 261 263 0.99
open replace random float large 16 15 1.05
open replace random int huge 5 4 1.16
open replace random int medium 561 550 1.02
open replace random int small 784 809 0.97
open replace sorted int huge 5 5 1.06
sync column after parted attribute 183027 30812020 0.01
sync float large 159533 124762 1.28
sync float large after replace 158292 153759 1.03
sync int huge 82528 82383 1.00
sync int huge after replace 1148164 1076351 1.07
sync int huge after sorted replace 1151184 958083 1.20
sync int medium 44866 39724 1.13
sync int small 6890 6655 1.04
sync int tiny 232 221 1.05
sync sym large 232276 182916 1.27
sync table after sort 61306010 56924120 1.08
GEOMETRIC MEAN 5325 6116 0.87
MAX RATIO 61306010 56924120 1.55

64 kdb+ processes:

    XFS (MB/s) ext4 (MB/s) Ratio
Test Type Test      
read mem write disk add attribute 12858 1876 6.86
read write disk disk sort 2847 903 3.15
write disk open append mid float, sync once 1347 1368 0.98
open append mid sym, sync once 2300 2118 1.09
write float large 62892 11133 5.65
write int huge 2455 2488 0.99
write int medium 47404 5879 8.06
write int small 28002 5433 5.15
write int tiny 2637 2934 0.90
write sym large 60629 17170 3.53
GEOMETRIC MEAN 9057 3420 2.65
MAX RATIO 62892 17170 8.06

Observation: The results show that XFS consistently and significantly outperformed ext4 in write-intensive operations. In critical ingestion and EOD tasks, write throughput on XFS was on average 2.6% times higher. This advantage peaked in specific operations, such as applying the p# attribute and persisting a medium length integer vector, where XFS was a remarkable 7x and 8x faster than ext4.

The performance of the less critical write operations is below.

    XFS (MB/s) ext4 (MB/s) Ratio
Test Type Test      
write disk append small, sync once 1726 1686 1.02
append tiny, sync once 2294 2120 1.08
open append small, sync once 1391 1399 0.99
open append tiny, sync once 2385 1463 1.63
open replace int tiny 12298 13634 0.90
open replace random float large 232 220 1.06
open replace random int huge 114 103 1.11
open replace random int medium 18188 18922 0.96
open replace random int small 28371 32363 0.88
open replace sorted int huge 59 60 0.99
sync column after parted attribute 139202 199845700 0.00
sync float large 98447 97428 1.01
sync float large after replace 192094 193340 0.99
sync int huge 230644 231697 1.00
sync int huge after replace 6272368 7152017 0.88
sync int huge after sorted replace 7883493 7317134 1.08
sync int medium 194125 173236 1.12
sync int small 132313 140824 0.94
sync int tiny 5592 6402 0.87
sync sym large 148040 147264 1.01
sync table after sort 111869100 373266900 0.30
GEOMETRIC MEAN 29819 43975 0.68
MAX RATIO 111869100 373266900 1.63

ext4 is faster in sync but this difference was negligible compared to the much longer write times required for sorting and applying attributes.

Read

We divide read tests into two categories depending on the source of the data (hot vs cold), disk vs memory (page cache).

Single kdb+ process:

    XFS (MB/s) ext4 (MB/s) Ratio
Test Type Test      
read disk mmap, random read 1M 597 590 1.01
mmap, random read 4k 20 19 1.05
mmap, random read 64k 200 192 1.05
random read 1M 616 546 1.13
random read 4k 21 19 1.10
random read 64k 207 184 1.13
sequential read binary 689 681 1.01
read disk write mem sequential read float large 1991 845 2.36
sequential read int huge 2039 870 2.34
sequential read int medium 624 472 1.32
sequential read int small 318 254 1.25
sequential read int tiny 26 23 1.14
GEOMETRIC MEAN 259 205 1.26
MAX RATIO 2039 870 2.36

Observation: XFS reads the data faster from disk sequentially than ext4. Apart from this, the differences are negligible.

    XFS (MB/s) ext4 (MB/s) Ratio
Test Type Test      
read mem mmap, random read 1M 2468 2451 1.01
mmap, random read 4k 261 263 0.99
mmap, random read 64k 1743 1704 1.02
random read 1M 3040 2997 1.01
random read 4k 1272 983 1.29
random read 64k 3001 3003 1.00
read mem write mem sequential read binary 2527 2513 1.01
sequential reread float large 15041 15229 0.99
sequential reread int huge 33832 33912 1.00
sequential reread int medium 8185 8119 1.01
sequential reread int small 2143 2070 1.03
GEOMETRIC MEAN 3141 3050 1.03
MAX RATIO 33832 33912 1.29

Observation: There is no significant performance difference between XFS and ext4 with a single kdb+ reader if the data is coming from page cache.

64 kdb+ processes:

    XFS (MB/s) ext4 (MB/s) Ratio
Test Type Test      
read disk mmap, random read 1M 2825 2821 1.00
mmap, random read 4k 544 534 1.02
mmap, random read 64k 1075 1073 1.00
random read 1M 2793 2786 1.00
random read 4k 547 544 1.01
random read 64k 1072 1069 1.00
sequential read binary 99058 5114 19.37
read disk write mem sequential read float large 1947 2825 0.69
sequential read int huge 3123 3250 0.96
sequential read int medium 2043 5358 0.38
sequential read int small 1537 6036 0.25
sequential read int tiny 421 1847 0.23
GEOMETRIC MEAN 1896 2100 0.90
MAX RATIO 99058 6036 19.37

Observation: Despite the edge of XFS with a single reader, ext4 outperforms XFS sequential read if multiple kdb+ processes are reading various data in parallel. This scenario is common in a pool of HDBs where multiple concurrent queries with non-selective filters result in numerous parallel sequential reads from disk.

For random reads that require accessing the storage device directly (a cache miss), we observed no meaningful performance difference between ext4 and XFS.

    XFS (MB/s) ext4 (MB/s) Ratio
Test Type Test      
read mem mmap, random read 1M 24627 39646 0.62
mmap, random read 4k 5617 5525 1.02
mmap, random read 64k 22215 23249 0.96
random read 1M 151307 132294 1.14
random read 4k 98365 64205 1.53
random read 64k 161306 158559 1.02
read mem write mem sequential read binary 27536 28720 0.96
sequential reread float large 1135453 1265438 0.90
sequential reread int huge 1459501 1518556 0.96
sequential reread int medium 568919 637707 0.89
sequential reread int small 120474 120112 1.00
GEOMETRIC MEAN 107897 110161 0.98
MAX RATIO 1459501 1518556 1.53

Observation: There is no clear winner in read performance if the data is coming from page cache and there are multiple readers.

Test 2: Ubuntu with Samsung NVMe SSD (PCIe 5.0)

Test setup

Component Specification
Storage * Type: 3.84 TB SAMSUNG MZWLO3T8HCLS-00A07
* Interface: PCIe 5.0 x4
* Sequential R/W: 14000 MB/s / 6000 MB/s
* Random Read: 2500K IOPS (4K)
CPU AMD EPYC 9575F (Turin), 2 sockets, 64 cores per socket, 2 threads per core, 256 MB L3 cache, SMT off
Memory 2.2 TB, DDR5@6400 MT/s (12 channels per socket)
OS Ubuntu 24.04.3 LTS (kernel: 6.8.0-83-generic)

Since compression is enabled by default in ZFS, we disabled it during the pool creation (-O compression=off) to ensure a fair comparison with the other file systems.

The values presented in the result tables represent throughput ratios to XFS throughput (e.g., a value of 2 indicates XFS was twice as fast).

Write

Single kdb+ process:

    ext4 Btrfs F2FS ZFS
Test Type Test        
read mem write disk add attribute 1.1 1.1 1.0 1.0
read write disk disk sort 1.1 1.1 1.1 1.1
write disk open append mid float, sync once 1.7 1.6 1.8 1.6
open append mid sym, sync once 1.2 1.2 1.2 1.0
write float large 2.8 1.9 2.7 0.9
write int huge 2.9 1.9 2.6 2.6
write int medium 2.4 1.8 2.7 1.0
write int small 1.3 4.4 1.1 1.4
write int tiny 1.2 0.7 0.8 1.2
write sym large 1.2 1.1 1.1 1.0
GEOMETRIC MEAN 1.6 1.5 1.5 1.2
MAX RATIO 2.9 4.4 2.7 2.6

Observation: XFS outperforms all other file systems if a single kdb+ process writes the data.

The performance of the less critical write operations is below.

    ext4 Btrfs F2FS ZFS
Test Type Test        
write disk append small, sync once 3.1 2.0 2.2 1.3
append tiny, sync once 2.0 1.4 1.3 1.2
open append small, sync once 1.5 1.5 1.6 0.9
open append tiny, sync once 1.2 1.0 1.1 2.4
open replace int tiny 1.2 1.2 1.0 1.0
open replace random float large 19.1 24.7 17.6 51.6
open replace random int huge 30.0 40.3 28.5 99.2
open replace random int medium 0.9 1.4 0.9 0.8
open replace random int small 0.9 1.4 0.9 0.7
open replace sorted int huge 14.0 48.5 12.5 9.7
sync float large 1.1 1.4 1.0 1.2
sync float large after replace 0.9 1.5 1.2 6.4
sync int huge 1.3 1.2 1.0 0.3
sync int huge after replace 0.2 1.5 0.1 4.7
sync int huge after sorted replace 0.2 1.8 0.1 5.2
sync int medium 1.2 1.3 1.7 0.9
sync int small 0.8 1.1 1.2 1.1
sync int tiny 1.1 1.4 1.0 0.9
sync sym large 1.8 1.5 1.4 1.4
sync table after sort 1.1 0.7 3.7 8.5
GEOMETRIC MEAN 1.6 2.2 1.5 2.5
MAX RATIO 30.0 48.5 28.5 99.2

Observation: XFS significantly outperformed all other file systems if only some random part of a vector needs to be overwritten (see open replace tests)

64 kdb+ processes:

    ext4 Btrfs F2FS ZFS
Test Type Test        
read mem write disk add attribute 3.0 2.9 3.2 3.3
read write disk disk sort 2.3 3.6 2.4 2.2
write disk open append mid float, sync once 1.1 0.8 2.7 1.7
open append mid sym, sync once 1.2 0.9 2.2 1.6
write float large 3.1 2.9 48.4 69.2
write int huge 1.1 1.7 4.6 3.5
write int medium 3.0 2.7 45.5 2.8
write int small 1.3 4.1 13.7 1.9
write int tiny 1.5 10.7 3.5 5.2
write sym large 1.2 1.1 10.6 14.2
GEOMETRIC MEAN 1.7 2.4 6.9 4.2
MAX RATIO 3.1 10.7 48.4 69.2

Observation: XFS significantly outperformed all other file systems in writing. Its margin can be significant, for example, persisting a large float vector (the set operation) is over 69 times faster on XFS than on ZFS.

The performance of the less critical write operations is below.

    ext4 Btrfs F2FS ZFS
Test Type Test        
write disk append small, sync once 1.2 1.1 3.4 1.3
append tiny, sync once 0.7 1.8 7.1 1.3
open append small, sync once 1.0 1.0 3.8 1.6
open append tiny, sync once 1.4 2.0 19.5 2.6
open replace int tiny 0.9 47.7 5.6 2.9
open replace random float large 5.8 2262.0 196.7 66.0
open replace random int huge 0.0 0.0 0.0 0.0
open replace random int medium 0.9 138.8 7.2 1.2
open replace random int small 0.8 182.6 38.1 1.0
open replace sorted int huge 0.0 1.0 0.1 0.0
sync float large 1.0 1.0 0.5 0.5
sync float large after replace 0.5 1.5 0.1 1.9
sync int huge 1.0 0.0 0.6 0.0
sync int huge after replace 29.7 11.8 5.5 86.3
sync int huge after sorted replace 0.1 0.1 0.0 0.2
sync int medium 1.1 1.7 6.3 3.1
sync int small 0.9 1.7 2.5 2.8
sync int tiny 0.8 1.0 1.3 2.5
sync sym large 1.0 1.1 0.5 1.0
sync table after sort 1.2 0.9 1.1 131.5
GEOMETRIC MEAN 0.5 2.3 1.4 1.1
MAX RATIO 29.7 2262.0 196.7 131.5

Read

Single kdb+ process:

    ext4 Btrfs F2FS ZFS
Test Type Test        
read disk mmap, random read 1M 1.1 4.4 1.2 4.3
mmap, random read 4k 1.0 1.3 1.1 1.7
mmap, random read 64k 1.0 6.5 1.0 2.0
random read 1M 1.1 4.3 1.2 4.3
random read 4k 1.0 1.2 1.1 1.8
random read 64k 1.0 6.6 1.0 2.1
sequential read binary 2.1 4.5 1.3 0.8
read disk write mem sequential read float large 1.2 0.8 1.4 3.0
sequential read int huge 1.3 1.0 1.4 3.0
sequential read int medium 7.3 4.9 10.1 14.9
sequential read int small 3.0 2.9 2.0 6.9
sequential read int tiny 1.7 1.9 1.2 3.7
GEOMETRIC MEAN 1.5 2.6 1.5 3.1
MAX RATIO 7.3 6.6 10.1 14.9

Observation: XFS excels in reading from disk if there is a single kdb+ reader.

    ext4 Btrfs F2FS ZFS
Test Type Test        
read mem mmap, random read 1M 1.1 1.1 1.2 1.1
mmap, random read 4k 0.9 1.0 1.0 0.8
mmap, random read 64k 1.1 1.0 1.1 1.0
random read 1M 1.3 1.4 1.2 1.5
random read 4k 1.0 1.0 1.1 0.7
random read 64k 1.1 1.0 1.2 1.0
read mem write mem sequential read binary 1.0 1.0 1.0 1.0
sequential reread float large 1.7 2.4 2.5 2.2
sequential reread int huge 1.9 2.1 2.3 1.9
sequential reread int medium 3.6 3.4 3.7 2.6
sequential reread int small 0.9 1.0 0.9 0.9
GEOMETRIC MEAN 1.3 1.4 1.4 1.2
MAX RATIO 3.6 3.4 3.7 2.6

Observation: XFS excels in (sequential) reading from page cache if there is a single kdb+ reader.

64 kdb+ processes:

    ext4 Btrfs F2FS ZFS
Test Type Test        
read disk mmap, random read 1M 1.0 2.2 0.9 1.2
mmap, random read 4k 1.0 1.1 0.9 1.6
mmap, random read 64k 0.8 1.5 0.7 0.9
random read 1M 1.0 2.2 0.9 1.2
random read 4k 1.0 1.0 0.9 1.6
random read 64k 0.8 1.5 0.8 0.9
sequential read binary 8.9 8.2 8.9 10.5
read disk write mem sequential read float large 0.7 0.6 0.7 0.8
sequential read int huge 0.9 0.9 0.9 1.1
sequential read int medium 1.0 0.6 1.0 1.5
sequential read int small 1.1 0.7 1.1 1.6
sequential read int tiny 1.0 1.6 1.7 2.8
GEOMETRIC MEAN 1.1 1.3 1.1 1.5
MAX RATIO 8.9 8.2 8.9 10.5

Observation: We observed that F2FS maintains a performance margin parallel disk reads from multiple kdb+ processes (e.g., an HDB pool). The sole exception was with binary reads (read1), a pattern not typically encountered in production kdb+ environments.

    ext4 Btrfs F2FS ZFS
Test Type Test        
read mem mmap, random read 1M 1.1 1.1 1.1 1.1
mmap, random read 4k 1.0 1.0 1.0 1.8
mmap, random read 64k 1.6 1.1 1.1 2.2
random read 1M 1.1 1.0 1.1 1.0
random read 4k 1.0 1.0 1.0 0.9
random read 64k 1.0 1.0 1.1 1.0
read mem write mem sequential read binary 1.0 1.0 1.0 1.5
sequential reread float large 1.9 1.9 57.7 4.8
sequential reread int huge 1.6 1.7 13.5 1.9
sequential reread int medium 2.0 2.1 2.0 7.4
sequential reread int small 1.0 1.0 1.0 1.7
GEOMETRIC MEAN 1.3 1.2 2.0 1.8
MAX RATIO 2.0 2.1 57.7 7.4

Observation: The performance advantage of F2FS vanishes entirely when data is served from the page cache. XFS is the clear winner if data is read sequentially by multiple kdb+ processes.