Overview¶
In this document, we compare compression algorithms using a popular financial dataset from the New York Stock Exchange (NYSE). There are three key metrics to evaluate compression algorithms.
- Compression ratio
- Compression speed
- Decompression speed
These metrics impact storage cost, data write time and query response times respectively. Both compression and decompression speeds depend on the hardware - primarily on storage speed and the compute (CPU) capacity. Our partner, Intel(R), provided access to two systems with different storage characteristics in its FasterLab, a facility dedicated to optimization of Financial Services Industry (FSI) solutions. The first system has fast local disks, while the second system comes with a slower NFS storage. The next sections describe the results in detail.
Compression ratios¶
Compression ratio measures the relative reduction in size of data. This ratio is calculated by dividing the uncompressed size by the compressed size. For example, a ratio of 4 indicates that the data consumes a quarter of the disk space after compression. In this document, we show the relative sizes after compression, which is the inverse of compression ratios. Lower values indicate better compression. The numbers are in percentages, so 25 corresponds to compression ratio 4. The block size parameter was set to 17, which translates to logical block size of 128 KB.
The table-level results are presented below.

zstd outperforms lz4 and snappy by nearly 2x, though it is only marginally better than gzip.
The following tables provide a column-level breakdown. The columns are ordered by entropy in decreasing order. Low-entropy columns typically compress well so those at the top of the table likely contribute most to disk savings. Gradient background coloring highlights results (dark red = poor compression).
Table quote:
| Compression Algorithm | gzip | lz4 | qipc | snappy | zstd | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compression Level | 1 | 5 | 6 | 9 | 1 | 5 | 9 | 12 | 16 | 0 | 0 | -7 | 1 | 10 | 12 | 14 | 22 |
| Participant_Timestamp | 46.8 | 44.9 | 45.0 | 45.2 | 70.2 | 69.5 | 69.0 | 68.9 | 68.9 | 100.0 | 71.5 | 96.5 | 41.3 | 40.9 | 40.9 | 41.0 | 41.0 |
| Time | 38.5 | 36.3 | 36.3 | 36.4 | 61.3 | 61.1 | 60.5 | 60.4 | 60.4 | 82.5 | 62.4 | 81.6 | 31.3 | 31.3 | 31.3 | 33.8 | 33.8 |
| Sequence_Number | 41.3 | 41.1 | 41.2 | 41.2 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 81.4 | 81.4 | 81.4 | 35.1 | 36.8 |
| Offer_Price | 8.6 | 6.9 | 6.6 | 6.5 | 13.5 | 11.1 | 10.1 | 9.6 | 9.6 | 17.5 | 15.9 | 16.3 | 7.7 | 6.4 | 6.4 | 6.1 | 5.6 |
| Bid_Price | 8.6 | 6.9 | 6.6 | 6.5 | 13.5 | 11.1 | 10.1 | 9.6 | 9.6 | 17.5 | 15.8 | 16.3 | 7.6 | 6.4 | 6.4 | 6.1 | 5.6 |
| Symbol | 0.6 | 0.2 | 0.2 | 0.2 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 1.7 | 4.7 | 0.0 | 0.1 | 0.0 | 0.0 | 0.0 | 0.0 |
| Offer_Size | 17.1 | 14.9 | 14.3 | 13.5 | 29.8 | 23.3 | 19.2 | 17.1 | 17.1 | 35.3 | 34.3 | 28.0 | 16.3 | 13.6 | 13.0 | 12.7 | 11.8 |
| Bid_Size | 16.9 | 14.7 | 14.0 | 13.2 | 29.4 | 23.0 | 18.9 | 16.8 | 16.8 | 34.8 | 33.9 | 27.7 | 16.0 | 13.4 | 12.8 | 12.5 | 11.6 |
| Exchange | 47.3 | 44.6 | 44.2 | 44.0 | 65.7 | 60.0 | 58.7 | 57.8 | 57.8 | 99.9 | 71.4 | 83.1 | 41.8 | 43.9 | 43.6 | 40.5 | 40.3 |
| National_BBO_Ind | 20.1 | 15.5 | 14.8 | 13.5 | 33.2 | 25.5 | 20.4 | 17.0 | 17.0 | 80.7 | 30.4 | 31.8 | 18.1 | 14.8 | 13.7 | 13.1 | 12.7 |
| Short_Sale_Restriction_Indicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Source_Of_Quote | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Retail_Interest_Indicator | 12.6 | 9.6 | 9.2 | 8.4 | 18.8 | 14.7 | 12.1 | 10.5 | 10.5 | 34.0 | 19.2 | 21.1 | 11.0 | 9.0 | 8.6 | 8.3 | 7.9 |
| National_BBO_LULD_Indicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Quote_Condition | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.9 | 4.7 | 0.1 | 0.1 | 0.0 | 0.0 | 0.0 | 0.0 |
| SIP_Generated_Message_Identifier | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Security_Status_Indicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| LULD_BBO_Indicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| FINRA_BBO_Indicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Quote_Cancel_Correction | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| FINRA_ADF_MPID_Indicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| FINRA_ADF_Market_Participant_Quote_Indicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| FINRA_ADF_Timestamp | 0.6 | 0.2 | 0.2 | 0.2 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 2.4 | 4.7 | 0.0 | 3.2 | 0.0 | 0.0 | 0.0 | 0.0 |
Table trade:
| Compression Algorithm | gzip | lz4 | qipc | snappy | zstd | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compression Level | 1 | 5 | 6 | 9 | 1 | 5 | 9 | 12 | 16 | 0 | 0 | -7 | 1 | 10 | 12 | 14 | 22 |
| Time | 45.5 | 43.2 | 43.2 | 43.4 | 67.0 | 66.8 | 66.3 | 66.2 | 66.2 | 97.8 | 68.6 | 86.2 | 39.5 | 39.6 | 39.6 | 41.5 | 41.7 |
| ParticipantTimestamp | 46.8 | 44.5 | 44.6 | 44.8 | 63.5 | 63.1 | 62.9 | 62.9 | 62.9 | 99.2 | 66.4 | 80.7 | 43.4 | 42.7 | 42.7 | 40.7 | 40.3 |
| SequenceNumber | 44.0 | 43.8 | 43.8 | 43.8 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 80.3 | 80.3 | 80.3 | 39.2 | 43.1 |
| TradeId | 26.1 | 22.3 | 22.1 | 22.0 | 43.8 | 42.7 | 42.5 | 42.3 | 42.3 | 75.8 | 47.7 | 39.1 | 21.4 | 18.6 | 18.6 | 16.6 | 16.5 |
| TradePrice | 19.6 | 16.9 | 16.7 | 16.6 | 28.6 | 24.1 | 22.9 | 22.4 | 22.4 | 30.2 | 29.9 | 36.5 | 20.2 | 17.1 | 17.1 | 16.0 | 15.2 |
| Symbol | 0.6 | 0.2 | 0.2 | 0.2 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 1.7 | 4.7 | 0.1 | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 |
| TradeReportingFacilityTRFTimestamp | 20.9 | 19.8 | 19.7 | 19.7 | 26.0 | 24.8 | 24.1 | 24.1 | 24.1 | 29.9 | 28.7 | 31.0 | 23.6 | 19.5 | 19.4 | 18.7 | 18.5 |
| TradeVolume | 29.3 | 26.8 | 25.7 | 23.9 | 40.1 | 32.5 | 29.2 | 27.8 | 27.8 | 49.4 | 45.4 | 37.7 | 27.3 | 23.8 | 23.1 | 22.9 | 22.0 |
| Exchange | 42.7 | 39.3 | 38.7 | 38.1 | 58.7 | 52.2 | 50.1 | 48.9 | 48.9 | 100.0 | 60.7 | 68.1 | 38.9 | 38.3 | 37.9 | 36.2 | 35.8 |
| SaleCondition | 7.7 | 5.9 | 5.2 | 4.6 | 13.9 | 9.9 | 7.5 | 6.2 | 6.2 | 15.2 | 16.5 | 18.1 | 7.8 | 5.3 | 4.8 | 4.9 | 4.2 |
| SourceofTrade | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.9 | 4.7 | 0.1 | 0.0 | 0.0 | 0.0 | 0.1 | 0.1 |
| TradeThroughExemptIndicator | 18.5 | 13.6 | 13.0 | 12.0 | 32.6 | 24.8 | 19.2 | 15.4 | 15.4 | 87.2 | 29.8 | 29.4 | 16.5 | 13.3 | 12.5 | 11.7 | 11.1 |
| TradeReportingFacility | 9.8 | 7.1 | 6.8 | 6.3 | 15.8 | 12.3 | 9.6 | 8.1 | 8.1 | 25.3 | 16.5 | 17.0 | 8.3 | 6.8 | 6.4 | 6.1 | 5.6 |
| TradeCorrectionIndicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| TradeStopStockIndicator | 0.5 | 0.1 | 0.1 | 0.1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.8 | 4.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
zstd excels at column-level compression, though its advantage over gzip is less pronounced at the table level. This discrepancy arises because minor differences in highly compressible columns (e.g. 0.025% vs. 0.1% relative size) have negligible impact on overall storage, whereas larger columns dominate. For example:
Sequence_Number: A typical capital markets column (monotonically increasing integers with repetitions) shows a stark contrast:gzip: 40% relative sizezstd: 80% relative size (except at high compression levels)lz4/snappy/qipc: No compression (100% relative size)
qipc does not compress all columns by default. The conditions under which qipc applies compression are documented precisely.
Key Observations¶
gzipandzstddeliver the best overall ratios.gzipsignificantly outperformszstdforSequence_Number(except atzstd's highest levels).zstdexcels at small file compression, which can be particularly useful for columns with many repetitions (e.g., wide schemas in IoT applications).- The best compression ratio can be achieved by using mixed compression strategies. You can set differenct compression for each column by passing a dictionaty to .z.zd
gziplevels 6–9 show minimal difference, but level 1 performs poorly on low-entropy columns.qipchas the worst compression ratio among the tested algorithms.
Write speed, compression times¶
The typical bottleneck of data ingestion is persiting tables to a storage. The write time determines the maximal ingestion rate.
Writing compressed data to storage involves three sequential steps:
- Serializing the data
- Compressing the serialized bytes
- Persisting the compressed output
These steps are executed by the set command in kdb+. Although the underlying compression library (e.g. gzip, zstd) may support multithreading, set is single-threaded.
The compression time ratios relative to the uncompressed set are in the tables below. Value e.g. 2 means that it takes twice as much to compress and save the table than to save (memory map) the table. Smaller numbers are better.
Results on system with block storage:

The following tables provide a column-level breakdown. Green cells mark speed improvement over uncompressed set, red cells highlight significant slowdown.
| Compression Algorithm | gzip | lz4 | qipc | snappy | zstd | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compression Level | 1 | 5 | 6 | 9 | 1 | 5 | 9 | 12 | 16 | 0 | 0 | -7 | 1 | 10 | 12 | 14 | 22 |
| Time | 14.3 | 23.5 | 34.9 | 171.3 | 13.7 | 14.6 | 21.6 | 89.9 | 89.5 | 2.4 | 2.9 | 1.8 | 3.6 | 19.6 | 38.4 | 69.8 | 143.1 |
| ParticipantTimestamp | 16.6 | 26.2 | 38.4 | 138.1 | 15.2 | 19.0 | 22.0 | 74.8 | 74.2 | 2.8 | 3.2 | 1.8 | 3.9 | 21.3 | 33.0 | 62.5 | 158.6 |
| SequenceNumber | 19.8 | 35.2 | 43.3 | 46.9 | 23.3 | 22.6 | 26.5 | 28.3 | 27.6 | 2.0 | 0.7 | 0.8 | 1.8 | 7.1 | 5.6 | 65.7 | 113.3 |
| TradeId | 2.7 | 6.5 | 12.9 | 74.8 | 2.8 | 4.2 | 9.3 | 49.0 | 48.7 | 1.3 | 1.3 | 1.4 | 1.5 | 8.8 | 20.1 | 19.4 | 127.5 |
| TradePrice | 8.5 | 17.3 | 27.4 | 53.7 | 8.0 | 13.8 | 35.0 | 65.1 | 65.1 | 2.5 | 2.3 | 2.5 | 3.2 | 27.2 | 67.6 | 84.5 | 374.2 |
| Symbol | 2.2 | 4.8 | 5.2 | 5.5 | 1.5 | 0.6 | 1.5 | 6.1 | 6.1 | 0.8 | 0.4 | 0.3 | 0.4 | 0.7 | 0.7 | 3.6 | 4.5 |
| TradeReportingFacilityTRFTimestamp | 7.5 | 11.4 | 15.3 | 88.8 | 7.6 | 8.8 | 25.3 | 368.5 | 368.9 | 2.0 | 1.7 | 1.6 | 2.4 | 12.9 | 64.1 | 54.5 | 482.5 |
| TradeVolume | 12.4 | 33.5 | 81.1 | 877.8 | 8.9 | 23.9 | 148.2 | 519.5 | 518.5 | 2.4 | 2.8 | 3.5 | 3.9 | 44.6 | 122.9 | 163.9 | 436.5 |
| Exchange | 15.3 | 55.5 | 117.5 | 525.3 | 15.8 | 39.6 | 171.6 | 294.0 | 295.0 | 3.6 | 4.4 | 3.8 | 5.0 | 62.2 | 104.0 | 156.7 | 362.6 |
| SaleCondition | 4.5 | 9.2 | 20.5 | 197.3 | 6.3 | 13.0 | 99.2 | 537.3 | 537.4 | 1.6 | 1.5 | 2.1 | 2.3 | 26.6 | 128.4 | 118.5 | 712.7 |
| SourceofTrade | 2.2 | 4.4 | 4.7 | 5.1 | 2.5 | 3.7 | 1.7 | 3.7 | 3.6 | 0.7 | 0.3 | 0.3 | 0.3 | 1.1 | 0.9 | 3.0 | 18.7 |
| TradeThroughExemptIndicator | 11.0 | 42.7 | 97.4 | 2104.0 | 12.4 | 35.6 | 347.6 | 3033.8 | 3035.3 | 4.7 | 2.1 | 3.2 | 3.3 | 102.5 | 200.0 | 216.9 | 784.7 |
| TradeReportingFacility | 5.9 | 19.1 | 39.5 | 610.3 | 7.1 | 17.6 | 156.2 | 900.3 | 901.3 | 2.5 | 1.2 | 1.7 | 1.8 | 41.2 | 92.6 | 94.0 | 374.5 |
| TradeCorrectionIndicator | 2.4 | 4.6 | 4.9 | 4.7 | 0.6 | 0.6 | 0.7 | 0.9 | 1.2 | 0.8 | 0.4 | 0.4 | 0.4 | 1.0 | 0.8 | 3.0 | 3.9 |
| TradeStopStockIndicator | 2.9 | 5.4 | 5.8 | 5.4 | 0.5 | 0.3 | 0.5 | 0.5 | 0.5 | 1.0 | 0.5 | 0.4 | 0.4 | 1.2 | 0.9 | 3.5 | 1.8 |
Key Observations¶
- Compression typically slows down
setoperations. - Notable exceptions:
snappyandzstdlevel 1 actually improve write speed for certain column types. For these columns,zstdprovides significantly better compression ratios thansnappy. - The level has a substantial impact on compression time, even for algorithms like
lz4; for example,zstdlevel 10 is considerably faster than level 22. - Higher compression levels rarely justify the performance cost, offering minimal improvement in ratio at the expense of significantly slower compression.
zstdlevel 1 offers the fastest compression.- Although the general principle that lower
zstdlevels equate to faster speeds (with reduced compression) holds true, the kdb+ wrapper introduces exceptions, making it challenging to pinpoint the optimal compression level.
Let us see how compression performs with a slower storage.
| Compression Algorithm | gzip | lz4 | qipc | snappy | zstd | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compression Level | 1 | 5 | 6 | 9 | 1 | 5 | 9 | 12 | 16 | 0 | 0 | -7 | 1 | 10 | 12 | 14 | 22 |
| Time | 1.9 | 2.9 | 4.4 | 22.3 | 1.8 | 2.2 | 2.7 | 12.2 | 12.1 | 1.2 | 1.0 | 0.9 | 0.7 | 2.3 | 4.4 | 8.3 | 16.4 |
| ParticipantTimestamp | 2.0 | 2.9 | 4.4 | 16.7 | 1.8 | 2.2 | 2.5 | 7.8 | 8.3 | 1.2 | 1.0 | 0.9 | 0.8 | 2.4 | 3.7 | 6.9 | 16.4 |
| SequenceNumber | 2.1 | 3.6 | 4.8 | 5.3 | 2.7 | 2.9 | 2.9 | 3.1 | 3.1 | 1.1 | 1.0 | 1.1 | 0.9 | 1.3 | 1.2 | 6.9 | 11.5 |
| TradeId | 1.1 | 2.2 | 4.8 | 30.2 | 1.1 | 1.5 | 3.1 | 23.2 | 23.2 | 1.0 | 0.8 | 0.8 | 0.7 | 3.1 | 7.6 | 7.2 | 46.7 |
| TradePrice | 1.0 | 1.8 | 2.9 | 5.8 | 0.9 | 1.6 | 3.7 | 7.0 | 7.0 | 0.5 | 0.5 | 0.6 | 0.5 | 2.7 | 7.0 | 8.7 | 36.1 |
| Symbol | 0.2 | 0.3 | 0.3 | 0.4 | 0.1 | 0.1 | 0.1 | 0.5 | 0.5 | 0.1 | 0.1 | 0.0 | 0.0 | 0.1 | 0.1 | 0.3 | 0.4 |
| TradeReportingFacilityTRFTimestamp | 0.9 | 1.2 | 1.7 | 10.6 | 0.8 | 1.0 | 2.6 | 50.2 | 50.2 | 0.5 | 0.4 | 0.4 | 0.4 | 1.4 | 6.9 | 6.0 | 47.2 |
| TradeVolume | 1.4 | 3.3 | 7.7 | 98.2 | 1.1 | 2.6 | 13.4 | 58.5 | 58.4 | 0.7 | 0.8 | 0.7 | 0.6 | 4.3 | 11.9 | 16.9 | 42.9 |
| Exchange | 1.9 | 5.6 | 11.4 | 54.0 | 1.9 | 4.3 | 16.7 | 28.7 | 28.6 | 1.4 | 1.1 | 1.1 | 0.8 | 6.3 | 10.4 | 15.8 | 35.6 |
| SaleCondition | 0.5 | 0.9 | 2.2 | 22.3 | 0.6 | 1.5 | 10.7 | 60.3 | 60.2 | 0.3 | 0.3 | 0.4 | 0.3 | 2.7 | 13.1 | 12.3 | 69.7 |
| SourceofTrade | 0.3 | 0.4 | 0.4 | 0.4 | 0.1 | 0.3 | 0.2 | 0.2 | 0.2 | 0.1 | 0.1 | 0.0 | 0.0 | 0.1 | 0.1 | 0.4 | 1.6 |
| TradeThroughExemptIndicator | 1.3 | 4.5 | 10.3 | 211.2 | 1.5 | 3.7 | 34.1 | 306.3 | 306.2 | 1.3 | 0.6 | 0.7 | 0.6 | 9.4 | 18.8 | 20.3 | 71.0 |
| TradeReportingFacility | 0.7 | 2.1 | 4.5 | 67.3 | 0.7 | 1.9 | 16.8 | 100.7 | 100.6 | 0.5 | 0.3 | 0.4 | 0.3 | 4.2 | 9.9 | 9.9 | 37.8 |
| TradeCorrectionIndicator | 0.2 | 0.3 | 0.3 | 0.3 | 0.0 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.0 | 0.1 | 0.1 | 0.3 | 0.4 |
| TradeStopStockIndicator | 0.2 | 0.4 | 0.4 | 0.4 | 0.0 | 0.1 | 0.0 | 0.0 | 0.1 | 0.2 | 0.1 | 0.0 | 0.0 | 0.1 | 0.1 | 0.4 | 0.3 |
These results — smaller ratios compared to uncompressed set and more green cells — indicate that the performance benefits of compression are amplified on slower disks. Notably, only zstd at level 1 consistently outperforms uncompressed set across all columns, while other compression methods generally slow down the set operation.
Scaling, syncing and appending¶
Because the set command is single-threaded, kdb+ systems often persist columns in parallel by peach when memory allows. In our case, the number of columns is smaller than the available cores so parallelizing provided clear speed advantage. Persisting all columns simultaneously took roughly the same time as persisting the largest column (TradeID). In real life, the writer process may have other responsibilities like ingesting new data or serving queries. These responsibilities also compete for CPU.
Data persisted via set may remain in the OS buffer cache before being written to disk, risking data loss if the system crashes. The user can trigger the flush with the fsync system call. If kdb+ processes wrote several files simultaneously and a consistent state is desired then system calls fsync and syncfs are used. These calls block the kdb+ process and so their execution time contributes to the write time. In our experiment fsync times were marginal compared to set, especially on NFS. The network is the bottleneck for NFS and the underlying storage system has plenty of time to flush the data.
While set is a common persistence method, intraday writedowns often use appends, implemented by amend at like .[file; (); ,; chunk]. set also chunks large vector writes behind the scenes, this explains why our test showed no speed difference between the two methods, regardless of compression.
Query response times¶
When data is stored in a compressed format, it must be decompressed before processing queries. The decompression speed directly impacts query execution time.
In the query test, we executed 14 distinct queries. The queries vary in filtering, grouping and aggregation parameters. Some filters trigger sequential reads, and queries with several filtering constraints perform random reads. We included queries with explicit parallel iteration (peach) and with as-of join as well. Data in the financial sector is streaming in simultaneously and as-of joins play a critical role in joining various tables.
The table below details each query’s performance metrics:
- Elapsed time in milliseconds (measured via
\ts) - Storage read in KB (returned by
iostat) - Query memory need of the query in KB (output of
\ts) - Memory need of the result table in KB
| Query | Elapsed time ms | Storage read (KB) | Query memory need (KB) | Result memory need (KB) |
|---|---|---|---|---|
| select from quote where date=2022.03.31, i<500000000 | 13379 | 36137692 | 44023417 | 39728448 |
aj[symtime`ex; select from tradeNorm where date=2022.03.31, size>500000; select from quoteNorm where date=2022.03.31] |
6845 | 2820024 | 9797897 | 82 |
| select nr: count i, avgMid: avg (bid + ask) % 2 by sym from quoteNorm where date=2022.03.31 | 5572 | 13914952 | 42950069 | 327 |
| select from tradeNorm where date=2022.03.31, i<>0 | 553 | 1721124 | 4160752 | 3087008 |
aj[SymbolTime; select from trade where date=2022.03.31, Symbol in someSyms; select from quote where date=2022.03.31] |
438 | 62524 | 8591330 | 1210 |
| distinct select sym, ex from tradeNorm where date=2022.03.31, size > 700000 | 343 | 379256 | 1207962 | 2 |
| raze {select from quoteNorm where date=2022.03.31, sym=x} each infreqIdList | 76 | 2736 | 1451 | 491 |
| raze {select first sym, wsumAsk:asize wsum ask, wsumBid: bsize wsum bid, sdevask:sdev ask,sdevbid:sdev bid, corPrice:ask cor bid, corSize: asize cor bsize from quoteNorm where date=2022.03.31, sym=x} each infreqIdList | 49 | 1996 | 38 | 3 |
| select bsize wavg bid, asize wavg ask from quoteNorm where date=2022.03.31, sym in someSyms | 23 | 10888 | 4197 | 0 |
| raze {select from quoteNorm where date=2022.03.31, sym=x} peach infreqIdList | 19 | 2724 | 564 | 491 |
| select from quote where date=2022.03.31, Symbol=`VFVA | 16 | 9400 | 10226 | 9699 |
| raze {select from quoteNorm where date=2022.03.31, sym=x, 4000<bsize+asize} peach infreqIdList | 16 | 1204 | 9 | 0 |
| raze {select first sym, wsumAsk:asize wsum ask, wsumBid: bsize wsum bid, sdevask:sdev ask,sdevbid:sdev bid, corPrice:ask cor bid, corSize: asize cor bsize from quoteNorm where date=2022.03.31, sym=x} peach infreqIdList | 12 | 1996 | 15 | 3 |
| select medMidSize: med (bsize + asize) % 2 from quoteNorm where date=2022.03.31, sym=`CIIG.W | 2 | 464 | 35 | 0 |
We started the kdb+ processes with numactl -N 0 -m 0 and 144 threads (-s 144). The Linux kernel parameter read_ahead_kb was set to 128. Query time ratios (shown in subsequent tables) compare performance with/without compression. Value, for example, 2 means that the query runs twice as fast without compression. Lower ratios are better. Dark red highlighting denotes significant slowdowns under compression. The queries are sorted descending by the storage read (iostat output).
To isolate caching effects, we cleared the page cache (echo 3 | sudo tee /proc/sys/vm/drop_caches) and executed the queries twice. Data came from the storage during the first execution, then from the page cache (memory).
| Compression Algorithm | gzip | lz4 | qipc | snappy | zstd | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compression Level | 1 | 5 | 6 | 9 | 1 | 5 | 9 | 12 | 16 | 0 | 0 | -7 | 1 | 10 | 12 | 14 | 22 |
| query | |||||||||||||||||
| select from quote where date=2022.03.31, i<500000000 | 4.7 | 4.7 | 4.7 | 4.7 | 4.7 | 4.6 | 4.6 | 4.6 | 4.6 | 4.7 | 4.6 | 4.7 | 4.7 | 4.7 | 4.7 | 4.7 | 4.7 |
| select nr: count i, avgMid: avg (bid + ask) % 2 by sym from quoteNorm where date=2022.03.31 | 4.0 | 4.1 | 4.1 | 4.1 | 4.0 | 4.0 | 3.9 | 3.9 | 3.9 | 4.1 | 4.1 | 4.1 | 4.1 | 4.1 | 4.1 | 4.1 | 4.1 |
| aj[`sym`time`ex; select from tradeNorm where date=2022.03.31, size>500000; select from quoteNorm where date=2022.03.31] | 1.9 | 1.8 | 1.8 | 1.7 | 1.1 | 1.1 | 1.1 | 1.1 | 1.1 | 1.5 | 1.2 | 1.2 | 1.2 | 1.2 | 1.2 | 1.3 | 1.3 |
| select from tradeNorm where date=2022.03.31, i<>0 | 5.4 | 5.5 | 5.4 | 5.4 | 5.4 | 5.4 | 5.4 | 5.4 | 5.4 | 5.3 | 5.3 | 5.4 | 5.5 | 5.5 | 5.5 | 5.5 | 5.5 |
| distinct select sym, ex from tradeNorm where date=2022.03.31, size > 700000 | 2.6 | 2.6 | 2.6 | 2.6 | 2.2 | 2.2 | 2.1 | 2.1 | 2.1 | 2.3 | 2.4 | 2.2 | 2.2 | 2.2 | 2.2 | 2.2 | 2.2 |
| aj[`Symbol`Time; select from trade where date=2022.03.31, Symbol in someSyms; select from quote where date=2022.03.31] | 1.3 | 1.4 | 1.3 | 1.3 | 1.1 | 1.1 | 1.0 | 1.0 | 1.0 | 1.1 | 1.1 | 1.2 | 1.2 | 1.3 | 1.2 | 1.3 | 1.2 |
| select bsize wavg bid, asize wavg ask from quoteNorm where date=2022.03.31, sym in someSyms | 2.2 | 1.9 | 2.0 | 1.9 | 1.4 | 1.3 | 1.2 | 1.2 | 1.3 | 2.0 | 1.7 | 1.7 | 1.5 | 1.5 | 1.3 | 1.6 | 1.4 |
| select from quote where date=2022.03.31, Symbol=`VFVA | 2.6 | 2.5 | 2.6 | 2.5 | 1.8 | 1.8 | 1.8 | 1.9 | 1.8 | 1.9 | 1.8 | 2.5 | 2.6 | 2.6 | 2.6 | 2.6 | 2.7 |
| raze {select from quoteNorm where date=2022.03.31, sym=x} each infreqIdList | 5.6 | 5.6 | 5.6 | 5.5 | 3.9 | 3.7 | 3.7 | 3.6 | 3.4 | 4.7 | 4.1 | 4.8 | 4.8 | 4.7 | 4.6 | 4.8 | 4.6 |
| raze {select from quoteNorm where date=2022.03.31, sym=x} peach infreqIdList | 12.1 | 12.5 | 12.2 | 11.7 | 11.8 | 11.5 | 11.6 | 11.6 | 12.3 | 10.4 | 11.0 | 13.9 | 13.8 | 13.8 | 15.7 | 14.2 | 12.5 |
| raze {select first sym, wsumAsk:asize wsum ask, wsumBid: bsize wsum bid, sdevask:sdev ask,sdevbid:sdev bid, corPrice:ask cor bid, corSize: asize cor bsize from quoteNorm where date=2022.03.31, sym=x} each infreqIdList | 5.8 | 5.7 | 5.7 | 5.6 | 4.1 | 3.9 | 4.1 | 3.8 | 3.7 | 5.4 | 4.5 | 5.2 | 5.4 | 5.1 | 5.0 | 5.3 | 5.0 |
| raze {select first sym, wsumAsk:asize wsum ask, wsumBid: bsize wsum bid, sdevask:sdev ask,sdevbid:sdev bid, corPrice:ask cor bid, corSize: asize cor bsize from quoteNorm where date=2022.03.31, sym=x} peach infreqIdList | 12.8 | 13.3 | 14.3 | 13.3 | 12.8 | 14.8 | 13.8 | 13.2 | 14.8 | 14.9 | 14.7 | 15.1 | 15.9 | 18.4 | 16.5 | 14.4 | 14.7 |
raze {select from quoteNorm where date=2022.03.31, sym=x, 4000| 8.9 |
8.4 |
8.4 |
9.2 |
8.2 |
8.1 |
8.4 |
8.0 |
8.1 |
8.4 |
8.2 |
9.0 |
7.8 |
8.2 |
9.6 |
9.8 |
9.6 |
|
| select medMidSize: med (bsize + asize) % 2 from quoteNorm where date=2022.03.31, sym=`CIIG.W | 3.0 | 3.0 | 3.5 | 3.5 | 2.5 | 2.5 | 3.0 | 2.5 | 3.0 | 2.5 | 3.0 | 2.5 | 2.5 | 3.0 | 2.5 | 3.0 | 2.5 |
The table below displays the second executions of the queries, that is, data was sourced from memory. Because these used the page cache, the storage speed impact is smaller. Query select medMidSize: med (bsize + asize) % 2 from quoteNorm where date=2022.03.31, sym=`CIIG.W without compression executed in less than 1 msec, so we rounded up the execution time to 1 msec to avoid division by zero.
Observe OS cache impact - higher ratios and more dark red cells.
| Compression Algorithm | gzip | lz4 | qipc | snappy | zstd | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compression Level | 1 | 5 | 6 | 9 | 1 | 5 | 9 | 12 | 16 | 0 | 0 | -7 | 1 | 10 | 12 | 14 | 22 |
| query | |||||||||||||||||
| select from quote where date=2022.03.31, i<500000000 | 21.4 | 21.5 | 21.4 | 21.4 | 21.3 | 21.0 | 21.0 | 20.9 | 20.9 | 21.2 | 21.0 | 21.4 | 21.6 | 21.4 | 21.3 | 21.5 | 21.6 |
| select nr: count i, avgMid: avg (bid + ask) % 2 by sym from quoteNorm where date=2022.03.31 | 19.1 | 19.2 | 19.3 | 19.2 | 19.0 | 18.7 | 18.6 | 18.4 | 18.5 | 19.5 | 19.4 | 19.5 | 19.5 | 19.4 | 19.3 | 19.4 | 19.4 |
| aj[`sym`time`ex; select from tradeNorm where date=2022.03.31, size>500000; select from quoteNorm where date=2022.03.31] | 4.2 | 4.0 | 4.0 | 4.0 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 3.2 | 2.5 | 2.5 | 2.7 | 2.7 | 2.7 | 2.9 | 2.9 |
| select from tradeNorm where date=2022.03.31, i<>0 | 12.3 | 12.3 | 12.4 | 12.4 | 12.1 | 12.2 | 12.1 | 12.2 | 12.2 | 12.0 | 12.0 | 12.4 | 12.5 | 12.4 | 12.4 | 12.5 | 12.6 |
| distinct select sym, ex from tradeNorm where date=2022.03.31, size > 700000 | 16.7 | 16.8 | 16.7 | 16.7 | 13.3 | 13.1 | 13.1 | 13.1 | 13.2 | 13.2 | 14.1 | 14.0 | 14.0 | 14.1 | 14.0 | 14.2 | 14.1 |
| aj[`Symbol`Time; select from trade where date=2022.03.31, Symbol in someSyms; select from quote where date=2022.03.31] | 1.7 | 1.8 | 1.8 | 1.8 | 1.3 | 1.3 | 1.3 | 1.3 | 1.3 | 1.4 | 1.3 | 1.6 | 1.7 | 1.7 | 1.7 | 1.7 | 1.7 |
| select bsize wavg bid, asize wavg ask from quoteNorm where date=2022.03.31, sym in someSyms | 16.0 | 14.0 | 13.5 | 13.0 | 7.5 | 6.5 | 6.5 | 6.5 | 6.5 | 13.0 | 9.5 | 10.0 | 10.5 | 9.0 | 8.5 | 9.0 | 9.0 |
| select from quote where date=2022.03.31, Symbol=`VFVA | 9.3 | 10.7 | 10.7 | 10.7 | 6.0 | 6.0 | 6.0 | 6.0 | 5.7 | 7.0 | 6.3 | 9.3 | 10.3 | 10.7 | 10.3 | 10.3 | 10.7 |
| raze {select from quoteNorm where date=2022.03.31, sym=x} each infreqIdList | 15.3 | 15.5 | 15.4 | 15.4 | 9.7 | 9.2 | 9.0 | 9.0 | 8.9 | 11.2 | 11.2 | 12.0 | 13.2 | 13.0 | 12.6 | 12.9 | 12.7 |
| raze {select from quoteNorm where date=2022.03.31, sym=x} peach infreqIdList | 10.7 | 11.4 | 11.5 | 11.5 | 10.8 | 11.0 | 10.6 | 11.0 | 10.5 | 10.5 | 10.5 | 13.0 | 12.7 | 13.0 | 12.8 | 12.6 | 12.7 |
| raze {select first sym, wsumAsk:asize wsum ask, wsumBid: bsize wsum bid, sdevask:sdev ask,sdevbid:sdev bid, corPrice:ask cor bid, corSize: asize cor bsize from quoteNorm where date=2022.03.31, sym=x} each infreqIdList | 15.9 | 15.5 | 15.1 | 15.1 | 8.9 | 8.4 | 8.4 | 8.2 | 8.2 | 13.7 | 10.9 | 13.9 | 14.5 | 14.2 | 14.2 | 14.3 | 13.9 |
| raze {select first sym, wsumAsk:asize wsum ask, wsumBid: bsize wsum bid, sdevask:sdev ask,sdevbid:sdev bid, corPrice:ask cor bid, corSize: asize cor bsize from quoteNorm where date=2022.03.31, sym=x} peach infreqIdList | 11.6 | 12.1 | 12.0 | 12.2 | 11.8 | 11.8 | 11.6 | 12.0 | 12.1 | 12.1 | 11.4 | 13.1 | 13.6 | 13.4 | 13.6 | 12.9 | 13.2 |
raze {select from quoteNorm where date=2022.03.31, sym=x, 4000| 8.5 |
8.5 |
8.3 |
8.2 |
8.3 |
8.4 |
8.5 |
8.5 |
8.1 |
8.4 |
8.8 |
8.4 |
8.6 |
8.2 |
8.2 |
8.2 |
8.2 |
|
| select medMidSize: med (bsize + asize) % 2 from quoteNorm where date=2022.03.31, sym=`CIIG.W | 3.0 | 3.0 | 3.0 | 3.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 |
Key Observations¶
- Compression slows queries, especially for CPU-bound workloads (e.g., multiple aggregations using multi-threaded primitives). Some queries were 20× slower with compression.
- OS caching amplifies slowdowns: When data resides in memory, compression overhead becomes more pronounced. Recommendation: Avoid compression for frequently accessed ("hot") data.
- Compression level has negligible impact on decompression speed. This consistent with zstd’s documentation:
Decompression speed is preserved and remains roughly the same at all settings, a property shared by most LZ compression algorithms, such as zlib or lzma.
- Algorithm choice matters minimally — except for
lz4with restrictive queries. However,lz4trades speed for higher disk usage.
Let us see how compression impacts query times if the data is stored on a slower (NFS) storage. The table below displays the ratios of the first execution of the queries. We omit the results of the second run because they are similar to the fast storage case.
| Compression Algorithm | zstd | lz4 | snappy | gzip | qipc | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compression Level | -7 | 22 | 1 | 14 | 12 | 10 | 9 | 5 | 16 | 1 | 12 | 0 | 9 | 6 | 5 | 1 | 0 |
| query | |||||||||||||||||
| select from quote where date=2022.03.31, i<500000000 | 0.4 | 0.2 | 0.3 | 0.2 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.4 | 0.3 | 0.4 | 0.3 | 0.3 | 0.3 | 0.3 | 0.4 |
| select nr: count i, avgMid: avg (bid + ask) % 2 by sym from quoteNorm where date=2022.03.31 | 0.3 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.3 | 0.2 | 0.3 | 0.2 | 0.2 | 0.2 | 0.2 | 0.3 |
| aj[`sym`time`ex; select from tradeNorm where date=2022.03.31, size>500000; select from quoteNorm where date=2022.03.31] | 0.6 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.5 | 0.4 | 0.5 | 0.4 | 0.5 | 0.5 | 0.6 | 0.5 | 0.6 | 0.5 |
| select from tradeNorm where date=2022.03.31, i<>0 | 0.4 | 0.2 | 0.3 | 0.2 | 0.2 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.4 | 0.2 | 0.2 | 0.2 | 0.3 | 0.4 |
| distinct select sym, ex from tradeNorm where date=2022.03.31, size > 700000 | 0.4 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.4 | 0.3 | 0.4 | 0.3 | 0.5 | 0.3 | 0.4 | 0.4 | 0.4 | 0.6 |
| aj[`Symbol`Time; select from trade where date=2022.03.31, Symbol in someSyms; select from quote where date=2022.03.31] | 0.9 | 0.8 | 1.0 | 0.8 | 0.8 | 0.8 | 0.9 | 1.0 | 0.8 | 0.9 | 0.9 | 1.1 | 1.5 | 0.9 | 0.9 | 1.0 | 1.0 |
| select bsize wavg bid, asize wavg ask from quoteNorm where date=2022.03.31, sym in someSyms | 1.2 | 0.9 | 1.0 | 0.9 | 0.8 | 0.9 | 1.0 | 1.1 | 1.0 | 1.1 | 0.9 | 1.3 | 1.0 | 0.9 | 1.0 | 1.1 | 1.3 |
| select from quote where date=2022.03.31, Symbol=`VFVA | 1.9 | 1.6 | 1.9 | 1.6 | 1.6 | 1.7 | 2.1 | 2.1 | 1.9 | 2.1 | 1.8 | 2.6 | 1.9 | 1.8 | 1.9 | 1.8 | 2.3 |
| raze {select from quoteNorm where date=2022.03.31, sym=x} each infreqIdList | 1.3 | 1.2 | 1.3 | 1.3 | 1.2 | 1.2 | 1.2 | 1.4 | 1.3 | 1.5 | 1.3 | 1.5 | 1.4 | 1.4 | 1.4 | 1.7 | 1.6 |
| raze {select from quoteNorm where date=2022.03.31, sym=x} peach infreqIdList | 6.3 | 4.9 | 5.2 | 5.0 | 5.2 | 5.4 | 5.2 | 5.3 | 5.5 | 5.9 | 5.2 | 5.7 | 5.0 | 5.0 | 4.8 | 5.4 | 6.4 |
| raze {select first sym, wsumAsk:asize wsum ask, wsumBid: bsize wsum bid, sdevask:sdev ask,sdevbid:sdev bid, corPrice:ask cor bid, corSize: asize cor bsize from quoteNorm where date=2022.03.31, sym=x} each infreqIdList | 4.8 | 4.6 | 5.0 | 4.3 | 4.7 | 4.7 | 4.1 | 3.9 | 3.9 | 3.9 | 3.4 | 4.8 | 5.6 | 5.5 | 5.7 | 5.8 | 5.0 |
| raze {select first sym, wsumAsk:asize wsum ask, wsumBid: bsize wsum bid, sdevask:sdev ask,sdevbid:sdev bid, corPrice:ask cor bid, corSize: asize cor bsize from quoteNorm where date=2022.03.31, sym=x} peach infreqIdList | 5.3 | 3.8 | 3.7 | 3.7 | 3.6 | 3.8 | 4.2 | 4.3 | 4.1 | 4.7 | 4.1 | 5.0 | 3.6 | 3.8 | 3.8 | 4.0 | 5.0 |
raze {select from quoteNorm where date=2022.03.31, sym=x, 4000| 3.9 |
3.5 |
3.4 |
3.1 |
3.4 |
3.5 |
3.2 |
3.8 |
3.5 |
4.0 |
3.5 |
4.1 |
3.3 |
3.6 |
3.3 |
3.5 |
4.0 |
|
| select medMidSize: med (bsize + asize) % 2 from quoteNorm where date=2022.03.31, sym=`CIIG.W | 2.4 | 2.7 | 2.2 | 1.8 | 2.1 | 2.1 | 2.0 | 2.0 | 1.6 | 1.8 | 2.0 | 2.7 | 2.2 | 1.7 | 2.3 | 2.0 | 2.4 |
Compression improves performance when large datasets are read from slow storage. Thus, it is recommended for cold tiers (rarely accessed data).
Summary¶
For an optimal balance of cost and query performance, we recommend a tiered storage strategy
- Hot tier should not employ compression to allow maximal ingestion rate and fastest queries. The hot tier should be located on fast storage. This tier typically stores a few weeks to month of data.
- The second tier contains a high volume of less frequently queried data. We recommend using compression. If the goal is ultimate query speed, then
snappyorlz4with level 5 or 6 are good choices. Chooselz4if the second priority is storage saving and usesnappyif you would like the data migration process (from the hot tier) to be fast. If the storage space is limited and you would like to achieve a high compression ratio, then usezstdlevel 10 for most columns.gziplevel 5 for sequence number-like columns.- Columns with parted attribute (e.g.
sym) are exceptions and should not be compressed.
- The cold tier contains high volume, rarely accessed data. It is typically placed on cheaper and slower storage, like object storage or HDD-backed solutions. We recommend using compression. If the second tier uses
lz4orsnappythen you might want to recompress the data withzstdto save more storage space.
Not all tables require identical partitioning strategies. Frequently accessed tables may remain in the hot tier for extended durations. Conversely, even within a heavily queried table, certain columns might be seldom accessed. In such cases, symbolic links can be used to migrate column files to the appropriate storage tier.
Infrastructure¶
Tests were conducted on version 9.4 of Red Hat Enterprise Linux using kdb+ 4.1 (version 2025.01.17). Compression performance depends on the compression library versions, which are listed below:
zlib: 1.2.11lz4: 1.9.3snappy: 1.1.8zstd: 1.5.1
Key specifications for the two systems:
- Local block storage and Intel Xeon 6 efficient CPU
- Storage: Intel SSD D7-P5510 (3.84 TB), with interface PCIe 4.0 x4, NVMe
- CPU: Intel(R) Xeon(R) 6780E (Efficient series)
- Sockets: 2
- Cores per socket: 144
- Thread(s) per core: 1
- NUMA nodes: 2
- filesystem: ext4
- memory: 502GiB, DDR5 6400 MT/s, 8 channels
- NFS storage and Intel Xeon 6 performance CPU
- Storage: NFS (version 4.2), mounted in sync mode, with read and write chunk sizes (
wsizeandrsize) 1 MB. NFS cache was not set up, i.e-o fscmount parameter was not set. - Some network parameters:
- MTU: 1500
- TCP read/write buffer size (
/proc/sys/net/core/rmem_default,/proc/sys/net/core/wmem_default): 212992
- CPU: Intel(R) Xeon(R) 6747P (Performance series)
- Sockets: 2
- Cores per socket: 48
- Thread(s) per core: 2
- NUMA nodes: 4
- memory: 502GiB, DDR5 6400 MT/s, 8 channels
- Storage: NFS (version 4.2), mounted in sync mode, with read and write chunk sizes (
The tests ran on a single NUMA node, using local node memory only. This is achieved by launching the kdb+ processes with numactl -N 0 -m 0.
Data¶
We used publicly available NYSE TAQ data for this analysis. Tables quote and trade were generated using the script taq.k. Table quote had 1.78 billion rows and consumed 180 GB disk space uncompressed. Table trade was smaller, contained 76 million rows and required 5.7 GB space. All tables were parted by the instrument ID (column Symbol). The data corresponds to a single day in 2022.
Below you can find some column details, including
- data type,
- uncompressed size,
- number of unique items,
- number of value changes,
- entropy using logarithm base 2.
Table trade:
| Column Name | Data Type | Size | Unique nr | Differ nr | Entropy |
|---|---|---|---|---|---|
| Time | timespan | 612882088 | 72361453 | 72635907 | 26.0 |
| Exchange | char | 76610275 | 19 | 39884984 | 3.4 |
| Symbol | sym | 613372360 | 11615 | 11615 | 10.9 |
| SaleCondition | sym | 612886168 | 118 | 34742301 | 3.1 |
| TradeVolume | int | 306441052 | 30432 | 61288223 | 5.5 |
| TradePrice | real | 306441052 | 1709277 | 29812918 | 15.5 |
| TradeStopStockIndicator | boolean | 76610275 | 2 | 3 | 0.0 |
| TradeCorrectionIndicator | short | 153220534 | 5 | 1461 | 0.0 |
| SequenceNumber | int | 306441052 | 7052899 | 76610223 | 22.3 |
| TradeId | string | 1507277336 | 25424743 | 76578507 | 20.0 |
| SourceofTrade | char | 76610275 | 2 | 4408 | 0.9 |
| TradeReportingFacility | boolean | 76610275 | 2 | 8561937 | 0.5 |
| ParticipantTimestamp | timespan | 612882088 | 61126348 | 63260094 | 25.6 |
| TradeReportingFacilityTRFTimestamp | timespan | 612882088 | 19609986 | 26931059 | 7.0 |
| TradeThroughExemptIndicator | boolean | 76610275 | 2 | 17317273 | 0.9 |
Table quote:
| Column Name | Data Type | Size | Unique nr | Differ nr | Entropy |
|---|---|---|---|---|---|
| Time | timespan | 14248237176 | 1580281467 | 1674631088 | 30.4 |
| Exchange | char | 1781029661 | 17 | 1349077745 | 3.7 |
| Symbol | sym | 14248731544 | 12127 | 12127 | 11.0 |
| Bid_Price | float | 14248237176 | 155049 | 815619737 | 14.3 |
| Bid_Size | int | 7124118596 | 7279 | 1100563342 | 3.9 |
| Offer_Price | float | 14248237176 | 168093 | 822513327 | 14.3 |
| Offer_Size | int | 7124118596 | 7264 | 1121734176 | 4.0 |
| Quote_Condition | char | 1781029661 | 7 | 451430 | 0.0 |
| Sequence_Number | int | 7124118596 | 113620772 | 1781029645 | 26.2 |
| National_BBO_Ind | char | 1781029661 | 9 | 525963500 | 1.7 |
| FINRA_BBO_Indicator | char | 1781029661 | 1 | 1 | 0.0 |
| FINRA_ADF_MPID_Indicator | char | 1781029661 | 1 | 1 | 0.0 |
| Quote_Cancel_Correction | char | 1781029661 | 1 | 1 | 0.0 |
| Source_Of_Quote | char | 1781029661 | 2 | 4421 | 0.8 |
| Retail_Interest_Indicator | char | 1781029661 | 4 | 282004323 | 0.6 |
| Short_Sale_Restriction_Indicator | char | 1781029661 | 8 | 6161 | 1.0 |
| LULD_BBO_Indicator | char | 1781029661 | 2 | 3 | 0.0 |
| SIP_Generated_Message_Identifier | char | 1781029661 | 3 | 32151 | 0.0 |
| National_BBO_LULD_Indicator | char | 1781029661 | 8 | 26600 | 0.1 |
| Participant_Timestamp | timespan | 14248237176 | 1720929968 | 1773753484 | 30.6 |
| FINRA_ADF_Timestamp | timespan | 14248237176 | 1 | 1 | 0.0 |
| FINRA_ADF_Market_Participant_Quote_Indicator | char | 1781029661 | 1 | 1 | 0.0 |
| Security_Status_Indicator | char | 1781029661 | 7 | 52 | 0.0 |