Choosing the Right File System for kdb+: A Case Study with KX Nano¶

The performance of a kdb+ system is critically dependent on the throughput and latency of its underlying storage. In a Linux environment, the file system is the foundational layer that enables data management on a given storage partition.

This paper presents a comparative performance analysis of various file systems using the KX Nano benchmarking utility. The evaluation was conducted across two distinct test environments, each with different operating systems and storage bandwidth (6500 vs 14000 MB/s) and IOPS (700K vs 2500K).

File systems tested:

ext4 (rev 1)
XFS (V5)
Btrfs (v6.6.3, compression off)
F2FS (v1.16.0, compression off)
ZFS (c2.2.2, compression off)

Summary¶

No single file system demonstrated superior performance across all tested metrics; the optimal choice depends on the primary workload characteristics. The optimal choice depends on the specific operations you need to accelerate. Furthermore, the host operating system (e.g., Red Hat Enterprise Linux vs. Ubuntu) constrains the set of available and supported file systems.

Our key recommendations are as follows:

For write-intensive workloads where data ingestion rate is the primary driver, XFS is the recommended file system.
- XFS consistently demonstrated the highest write throughput, particularly under concurrent write scenarios. For instance, a kdb+ set operation on a large float vector (31 million elements) executed 5.6x faster on XFS than on ext4 and nearly 70x faster than on ZFS.
- This superior write performance translates to significant speedups in other I/O-heavy operations. Parallel disk sorting was 3.1x faster, and applying the p# (parted) attribute was 6.9x faster on XFS compared to ext4. Consequently, workloads like end-of-day (EOD) data processing will achieve the best performance with XFS.
For read-intensive workloads where query latency is paramount, the choice is nuanced:
- On Red Hat Enterprise Linux 9, ext4 holds a slight advantage for queries dominated by sequential reads. For random reads, its performance was comparable to XFS.
- On Ubuntu, F2FS demonstrated a performance margin in random read operations. However, this advantage shifted decisively to XFS when the data was already resident in the operating system's page cache.

kdb+ also supports tiering. For tiered data architectures (e.g., hot, mid, cold tiers), a hybrid approach is advisable.

Hot tier: Data is frequently queried and often resides in the page cache. For this tier, a read-optimized file system like ext4 or XFS is effective.
Mid Tier: Data is queried less often, meaning reads are more likely to come directly from storage. In this scenario, F2FS's stronger random read performance from storage provides some advantage.
Cold Tier: Data is typically compressed and stored on high-latency, cost-effective media like HDDs or object storage. While kdb+ has built-in compression support, file systems like Btrfs, F2FS, and ZFS also offer this feature. The performance implications of file-system-level compression warrant a separate, dedicated study.

Disclaimer: These guidelines are specific to the tested hardware and workloads. We strongly encourage readers to perform their own benchmarks that reflect their specific application profiles. To facilitate this, the benchmarking suite used in this study is included with the KX Nano codebase, available on GitHub.

Details¶

All benchmarks were executed in September 2025 using kdb+ 4.1 (2025.04.28) and KX Nano 6.4.5. Each kdb+ process was configured to use 8 worker threads (-s 8).

We used the default vector length of KX Nano, which are:

  * tiny: 2047
  * small: 63k
  * medium: 127k
  * large: 31m
  * huge: 1000m

Test 1: Red Hat Enterprise Linux 9 with Intel NVMe SSD (PCIe 4.0)¶

This first test configuration utilized an Intel NVMe SSD on a server running Red Hat Enterprise Linux (RHEL) 9.3. Following RHEL 9's official supported file systems, the comparison was limited to ext4 and XFS.

Test Setup¶

Component	Specification
Storage	* Type: 3.84 TB Intel SSD D7-P5510 * Interface: PCIe 4.0 x4, NVMe * Sequential R/W: 6500 MB/s / 3400 MB/s * Random Read: 700K IOPS (4K) * Latency: Random Read: 82 µs (4K), Sequential Read / Write: 10 µs / 13 µs (4K)
CPU	Intel(R) Xeon(R) 6747P (2 sockets, 48 cores per socket, 2 threads per core)
Memory	502GiB, DDR5 @ 6400 MT/s
OS	RHEL 9.3 (kernel 5.14.0-362.8.1.el9_3.x86_64)

The values presented in the result tables represent throughput in MB/s, where higher figures indicate better performance. The "Ratio" column quantifies the performance of XFS relative to ext4 (e.g., a value of 2 indicates XFS was twice as fast).

Write¶

We split the write results into two tables. The first table contains the "high-impact" tests and should be considered with more weight. These test are related to EOD (write, sort, applying attribute) and EOI (append) works that is often the bottleneck of ingestion.

Single kdb+ process:¶

		XFS (MB/s)	ext4 (MB/s)	Ratio
Test Type	Test
read mem write disk	add attribute	259	231	1.12
read write disk	disk sort	105	97	1.09
write disk	open append mid float, sync once	1038	870	1.19
	open append mid sym, sync once	932	841	1.11
	write float large	2170	1304	1.66
	write int huge	3338	2157	1.55
	write int medium	3070	1999	1.54
	write int small	910	1119	0.81
	write int tiny	100	50	2.01
	write sym large	1480	1289	1.15
GEOMETRIC MEAN		776	605	1.28
MAX RATIO		3338	2157	2.01

Observation: XFS is almost always faster than ext4. In critical tests, the advantage is almost 28% on average with a maximal difference 101%.

The performance of the less critical write operations is below. The Linux sync command synchronizes cached data to permanent storage. This data includes modified superblocks, modified inodes, delayed reads and writes, and others. EOD and EOI solutions often use sync operations to improve resiliency by ensuring data is persisted to storage and not held temporarily in caches. The sync operation is typically much faster than the set command because Linux executes it behind the scenes (compare the speed of write float large and sync float large). The throughput for sync operation is not always helpful because sync does not necessarily need to handle the entire vector.

		XFS (MB/s)	ext4 (MB/s)	Ratio
Test Type	Test
write disk	append small, sync once	753	484	1.55
	append tiny, sync once	549	368	1.49
	open append small, sync once	937	812	1.15
	open append tiny, sync once	200	210	0.96
	open replace int tiny	261	263	0.99
	open replace random float large	16	15	1.05
	open replace random int huge	5	4	1.16
	open replace random int medium	561	550	1.02
	open replace random int small	784	809	0.97
	open replace sorted int huge	5	5	1.06
	sync column after parted attribute	183027	30812020	0.01
	sync float large	159533	124762	1.28
	sync float large after replace	158292	153759	1.03
	sync int huge	82528	82383	1.00
	sync int huge after replace	1148164	1076351	1.07
	sync int huge after sorted replace	1151184	958083	1.20
	sync int medium	44866	39724	1.13
	sync int small	6890	6655	1.04
	sync int tiny	232	221	1.05
	sync sym large	232276	182916	1.27
	sync table after sort	61306010	56924120	1.08
GEOMETRIC MEAN		5325	6116	0.87
MAX RATIO		61306010	56924120	1.55

64 kdb+ processes:¶

		XFS (MB/s)	ext4 (MB/s)	Ratio
Test Type	Test
read mem write disk	add attribute	12858	1876	6.86
read write disk	disk sort	2847	903	3.15
write disk	open append mid float, sync once	1347	1368	0.98
	open append mid sym, sync once	2300	2118	1.09
	write float large	62892	11133	5.65
	write int huge	2455	2488	0.99
	write int medium	47404	5879	8.06
	write int small	28002	5433	5.15
	write int tiny	2637	2934	0.90
	write sym large	60629	17170	3.53
GEOMETRIC MEAN		9057	3420	2.65
MAX RATIO		62892	17170	8.06

Observation: The results show that XFS consistently and significantly outperformed ext4 in write-intensive operations. In critical ingestion and EOD tasks, write throughput on XFS was on average 2.6% times higher. This advantage peaked in specific operations, such as applying the p# attribute and persisting a medium length integer vector, where XFS was a remarkable 7x and 8x faster than ext4.

The performance of the less critical write operations is below.

		XFS (MB/s)	ext4 (MB/s)	Ratio
Test Type	Test
write disk	append small, sync once	1726	1686	1.02
	append tiny, sync once	2294	2120	1.08
	open append small, sync once	1391	1399	0.99
	open append tiny, sync once	2385	1463	1.63
	open replace int tiny	12298	13634	0.90
	open replace random float large	232	220	1.06
	open replace random int huge	114	103	1.11
	open replace random int medium	18188	18922	0.96
	open replace random int small	28371	32363	0.88
	open replace sorted int huge	59	60	0.99
	sync column after parted attribute	139202	199845700	0.00
	sync float large	98447	97428	1.01
	sync float large after replace	192094	193340	0.99
	sync int huge	230644	231697	1.00
	sync int huge after replace	6272368	7152017	0.88
	sync int huge after sorted replace	7883493	7317134	1.08
	sync int medium	194125	173236	1.12
	sync int small	132313	140824	0.94
	sync int tiny	5592	6402	0.87
	sync sym large	148040	147264	1.01
	sync table after sort	111869100	373266900	0.30
GEOMETRIC MEAN		29819	43975	0.68
MAX RATIO		111869100	373266900	1.63

ext4 is faster in sync but this difference was negligible compared to the much longer write times required for sorting and applying attributes.

Read¶

We divide read tests into two categories depending on the source of the data (hot vs cold), disk vs memory (page cache).

Single kdb+ process:¶

		XFS (MB/s)	ext4 (MB/s)	Ratio
Test Type	Test
read disk	mmap, random read 1M	597	590	1.01
	mmap, random read 4k	20	19	1.05
	mmap, random read 64k	200	192	1.05
	random read 1M	616	546	1.13
	random read 4k	21	19	1.10
	random read 64k	207	184	1.13
	sequential read binary	689	681	1.01
read disk write mem	sequential read float large	1991	845	2.36
	sequential read int huge	2039	870	2.34
	sequential read int medium	624	472	1.32
	sequential read int small	318	254	1.25
	sequential read int tiny	26	23	1.14
GEOMETRIC MEAN		259	205	1.26
MAX RATIO		2039	870	2.36

Observation: XFS reads the data faster from disk sequentially than ext4. Apart from this, the differences are negligible.

		XFS (MB/s)	ext4 (MB/s)	Ratio
Test Type	Test
read mem	mmap, random read 1M	2468	2451	1.01
	mmap, random read 4k	261	263	0.99
	mmap, random read 64k	1743	1704	1.02
	random read 1M	3040	2997	1.01
	random read 4k	1272	983	1.29
	random read 64k	3001	3003	1.00
read mem write mem	sequential read binary	2527	2513	1.01
	sequential reread float large	15041	15229	0.99
	sequential reread int huge	33832	33912	1.00
	sequential reread int medium	8185	8119	1.01
	sequential reread int small	2143	2070	1.03
GEOMETRIC MEAN		3141	3050	1.03
MAX RATIO		33832	33912	1.29

Observation: There is no significant performance difference between XFS and ext4 with a single kdb+ reader if the data is coming from page cache.

64 kdb+ processes:¶

		XFS (MB/s)	ext4 (MB/s)	Ratio
Test Type	Test
read disk	mmap, random read 1M	2825	2821	1.00
	mmap, random read 4k	544	534	1.02
	mmap, random read 64k	1075	1073	1.00
	random read 1M	2793	2786	1.00
	random read 4k	547	544	1.01
	random read 64k	1072	1069	1.00
	sequential read binary	99058	5114	19.37
read disk write mem	sequential read float large	1947	2825	0.69
	sequential read int huge	3123	3250	0.96
	sequential read int medium	2043	5358	0.38
	sequential read int small	1537	6036	0.25
	sequential read int tiny	421	1847	0.23
GEOMETRIC MEAN		1896	2100	0.90
MAX RATIO		99058	6036	19.37

Observation: Despite the edge of XFS with a single reader, ext4 outperforms XFS sequential read if multiple kdb+ processes are reading various data in parallel. This scenario is common in a pool of HDBs where multiple concurrent queries with non-selective filters result in numerous parallel sequential reads from disk.

For random reads that require accessing the storage device directly (a cache miss), we observed no meaningful performance difference between ext4 and XFS.

		XFS (MB/s)	ext4 (MB/s)	Ratio
Test Type	Test
read mem	mmap, random read 1M	24627	39646	0.62
	mmap, random read 4k	5617	5525	1.02
	mmap, random read 64k	22215	23249	0.96
	random read 1M	151307	132294	1.14
	random read 4k	98365	64205	1.53
	random read 64k	161306	158559	1.02
read mem write mem	sequential read binary	27536	28720	0.96
	sequential reread float large	1135453	1265438	0.90
	sequential reread int huge	1459501	1518556	0.96
	sequential reread int medium	568919	637707	0.89
	sequential reread int small	120474	120112	1.00
GEOMETRIC MEAN		107897	110161	0.98
MAX RATIO		1459501	1518556	1.53

Observation: There is no clear winner in read performance if the data is coming from page cache and there are multiple readers.

Test 2: Ubuntu with Samsung NVMe SSD (PCIe 5.0)¶

Test setup¶

Component	Specification
Storage	* Type: 3.84 TB SAMSUNG MZWLO3T8HCLS-00A07 * Interface: PCIe 5.0 x4 * Sequential R/W: 14000 MB/s / 6000 MB/s * Random Read: 2500K IOPS (4K)
CPU	AMD EPYC 9575F (Turin), 2 sockets, 64 cores per socket, 2 threads per core, 256 MB L3 cache, SMT off
Memory	2.2 TB, DDR5@6400 MT/s (12 channels per socket)
OS	Ubuntu 24.04.3 LTS (kernel: 6.8.0-83-generic)

Since compression is enabled by default in ZFS, we disabled it during the pool creation (-O compression=off) to ensure a fair comparison with the other file systems.

The values presented in the result tables represent throughput ratios to XFS throughput (e.g., a value of 2 indicates XFS was twice as fast).

Write¶

Single kdb+ process:¶

		ext4	Btrfs	F2FS	ZFS
Test Type	Test
read mem write disk	add attribute	1.1	1.1	1.0	1.0
read write disk	disk sort	1.1	1.1	1.1	1.1
write disk	open append mid float, sync once	1.7	1.6	1.8	1.6
	open append mid sym, sync once	1.2	1.2	1.2	1.0
	write float large	2.8	1.9	2.7	0.9
	write int huge	2.9	1.9	2.6	2.6
	write int medium	2.4	1.8	2.7	1.0
	write int small	1.3	4.4	1.1	1.4
	write int tiny	1.2	0.7	0.8	1.2
	write sym large	1.2	1.1	1.1	1.0
GEOMETRIC MEAN		1.6	1.5	1.5	1.2
MAX RATIO		2.9	4.4	2.7	2.6

Observation: XFS outperforms all other file systems if a single kdb+ process writes the data.

The performance of the less critical write operations is below.

		ext4	Btrfs	F2FS	ZFS
Test Type	Test
write disk	append small, sync once	3.1	2.0	2.2	1.3
	append tiny, sync once	2.0	1.4	1.3	1.2
	open append small, sync once	1.5	1.5	1.6	0.9
	open append tiny, sync once	1.2	1.0	1.1	2.4
	open replace int tiny	1.2	1.2	1.0	1.0
	open replace random float large	19.1	24.7	17.6	51.6
	open replace random int huge	30.0	40.3	28.5	99.2
	open replace random int medium	0.9	1.4	0.9	0.8
	open replace random int small	0.9	1.4	0.9	0.7
	open replace sorted int huge	14.0	48.5	12.5	9.7
	sync float large	1.1	1.4	1.0	1.2
	sync float large after replace	0.9	1.5	1.2	6.4
	sync int huge	1.3	1.2	1.0	0.3
	sync int huge after replace	0.2	1.5	0.1	4.7
	sync int huge after sorted replace	0.2	1.8	0.1	5.2
	sync int medium	1.2	1.3	1.7	0.9
	sync int small	0.8	1.1	1.2	1.1
	sync int tiny	1.1	1.4	1.0	0.9
	sync sym large	1.8	1.5	1.4	1.4
	sync table after sort	1.1	0.7	3.7	8.5
GEOMETRIC MEAN		1.6	2.2	1.5	2.5
MAX RATIO		30.0	48.5	28.5	99.2

Observation: XFS significantly outperformed all other file systems if only some random part of a vector needs to be overwritten (see open replace tests)

64 kdb+ processes:¶

		ext4	Btrfs	F2FS	ZFS
Test Type	Test
read mem write disk	add attribute	3.0	2.9	3.2	3.3
read write disk	disk sort	2.3	3.6	2.4	2.2
write disk	open append mid float, sync once	1.1	0.8	2.7	1.7
	open append mid sym, sync once	1.2	0.9	2.2	1.6
	write float large	3.1	2.9	48.4	69.2
	write int huge	1.1	1.7	4.6	3.5
	write int medium	3.0	2.7	45.5	2.8
	write int small	1.3	4.1	13.7	1.9
	write int tiny	1.5	10.7	3.5	5.2
	write sym large	1.2	1.1	10.6	14.2
GEOMETRIC MEAN		1.7	2.4	6.9	4.2
MAX RATIO		3.1	10.7	48.4	69.2

Observation: XFS significantly outperformed all other file systems in writing. Its margin can be significant, for example, persisting a large float vector (the set operation) is over 69 times faster on XFS than on ZFS.

The performance of the less critical write operations is below.

		ext4	Btrfs	F2FS	ZFS
Test Type	Test
write disk	append small, sync once	1.2	1.1	3.4	1.3
	append tiny, sync once	0.7	1.8	7.1	1.3
	open append small, sync once	1.0	1.0	3.8	1.6
	open append tiny, sync once	1.4	2.0	19.5	2.6
	open replace int tiny	0.9	47.7	5.6	2.9
	open replace random float large	5.8	2262.0	196.7	66.0
	open replace random int huge	0.0	0.0	0.0	0.0
	open replace random int medium	0.9	138.8	7.2	1.2
	open replace random int small	0.8	182.6	38.1	1.0
	open replace sorted int huge	0.0	1.0	0.1	0.0
	sync float large	1.0	1.0	0.5	0.5
	sync float large after replace	0.5	1.5	0.1	1.9
	sync int huge	1.0	0.0	0.6	0.0
	sync int huge after replace	29.7	11.8	5.5	86.3
	sync int huge after sorted replace	0.1	0.1	0.0	0.2
	sync int medium	1.1	1.7	6.3	3.1
	sync int small	0.9	1.7	2.5	2.8
	sync int tiny	0.8	1.0	1.3	2.5
	sync sym large	1.0	1.1	0.5	1.0
	sync table after sort	1.2	0.9	1.1	131.5
GEOMETRIC MEAN		0.5	2.3	1.4	1.1
MAX RATIO		29.7	2262.0	196.7	131.5

Read¶

Single kdb+ process:¶

		ext4	Btrfs	F2FS	ZFS
Test Type	Test
read disk	mmap, random read 1M	1.1	4.4	1.2	4.3
	mmap, random read 4k	1.0	1.3	1.1	1.7
	mmap, random read 64k	1.0	6.5	1.0	2.0
	random read 1M	1.1	4.3	1.2	4.3
	random read 4k	1.0	1.2	1.1	1.8
	random read 64k	1.0	6.6	1.0	2.1
	sequential read binary	2.1	4.5	1.3	0.8
read disk write mem	sequential read float large	1.2	0.8	1.4	3.0
	sequential read int huge	1.3	1.0	1.4	3.0
	sequential read int medium	7.3	4.9	10.1	14.9
	sequential read int small	3.0	2.9	2.0	6.9
	sequential read int tiny	1.7	1.9	1.2	3.7
GEOMETRIC MEAN		1.5	2.6	1.5	3.1
MAX RATIO		7.3	6.6	10.1	14.9

Observation: XFS excels in reading from disk if there is a single kdb+ reader.

		ext4	Btrfs	F2FS	ZFS
Test Type	Test
read mem	mmap, random read 1M	1.1	1.1	1.2	1.1
	mmap, random read 4k	0.9	1.0	1.0	0.8
	mmap, random read 64k	1.1	1.0	1.1	1.0
	random read 1M	1.3	1.4	1.2	1.5
	random read 4k	1.0	1.0	1.1	0.7
	random read 64k	1.1	1.0	1.2	1.0
read mem write mem	sequential read binary	1.0	1.0	1.0	1.0
	sequential reread float large	1.7	2.4	2.5	2.2
	sequential reread int huge	1.9	2.1	2.3	1.9
	sequential reread int medium	3.6	3.4	3.7	2.6
	sequential reread int small	0.9	1.0	0.9	0.9
GEOMETRIC MEAN		1.3	1.4	1.4	1.2
MAX RATIO		3.6	3.4	3.7	2.6

Observation: XFS excels in (sequential) reading from page cache if there is a single kdb+ reader.

64 kdb+ processes:¶

		ext4	Btrfs	F2FS	ZFS
Test Type	Test
read disk	mmap, random read 1M	1.0	2.2	0.9	1.2
	mmap, random read 4k	1.0	1.1	0.9	1.6
	mmap, random read 64k	0.8	1.5	0.7	0.9
	random read 1M	1.0	2.2	0.9	1.2
	random read 4k	1.0	1.0	0.9	1.6
	random read 64k	0.8	1.5	0.8	0.9
	sequential read binary	8.9	8.2	8.9	10.5
read disk write mem	sequential read float large	0.7	0.6	0.7	0.8
	sequential read int huge	0.9	0.9	0.9	1.1
	sequential read int medium	1.0	0.6	1.0	1.5
	sequential read int small	1.1	0.7	1.1	1.6
	sequential read int tiny	1.0	1.6	1.7	2.8
GEOMETRIC MEAN		1.1	1.3	1.1	1.5
MAX RATIO		8.9	8.2	8.9	10.5

Observation: We observed that F2FS maintains a performance margin parallel disk reads from multiple kdb+ processes (e.g., an HDB pool). The sole exception was with binary reads (read1), a pattern not typically encountered in production kdb+ environments.

		ext4	Btrfs	F2FS	ZFS
Test Type	Test
read mem	mmap, random read 1M	1.1	1.1	1.1	1.1
	mmap, random read 4k	1.0	1.0	1.0	1.8
	mmap, random read 64k	1.6	1.1	1.1	2.2
	random read 1M	1.1	1.0	1.1	1.0
	random read 4k	1.0	1.0	1.0	0.9
	random read 64k	1.0	1.0	1.1	1.0
read mem write mem	sequential read binary	1.0	1.0	1.0	1.5
	sequential reread float large	1.9	1.9	57.7	4.8
	sequential reread int huge	1.6	1.7	13.5	1.9
	sequential reread int medium	2.0	2.1	2.0	7.4
	sequential reread int small	1.0	1.0	1.0	1.7
GEOMETRIC MEAN		1.3	1.2	2.0	1.8
MAX RATIO		2.0	2.1	57.7	7.4

Observation: The performance advantage of F2FS vanishes entirely when data is served from the page cache. XFS is the clear winner if data is read sequentially by multiple kdb+ processes.