Taq Module¶

This page explains what the Taq module is and when to use it.

The NYSE TAQ Data Loader is a high-performance ingestion module designed to streamline the transformation of large-scale NYSE Trade and Quote (TAQ) PSV files into optimized, date-partitioned kdb+ databases. It replaces legacy k-based scripts with a modern, q-native implementation that leverages multithreaded parsing and parallel column writes to maximize throughput. By offering granular control over memory usage through batch processing and flexible filtering by symbol or date, the module provides a robust framework for managing massive datasets.

Key features¶

Memory Management: Supports batch processing (configurable through batchsize) to prevent RAM exhaustion during large file loads.
Customizable Filtering: Built-in options to filter by symbol ranges (first letter) and the ability to exclude test symbols.
Compression Support: Native integration with .z.zd settings, allowing for on-the-fly compression during the ingestion process.
Parallel Ingestion: Utilizes secondary threads (-s) for both PSV parsing and simultaneous column writes to disk.
Modern Tooling: Rewritten in q for better maintainability, featuring enhanced error handling, logging and documentation in qdoc syntax.

Typical use cases: Benchmarking¶

The module is primarily used as a standardized tool for performance benchmarking across different environments. Typical scenarios include:

Hardware Validation: Testing disk I/O and CPU scaling by measuring the ingestion speed. Evaluate query times using real data.
Optimization Tuning: Comparing the impact of different kdb+ settings, such as adjusting compression algorithms (for example, LZ4 vs. Zstd) and logical block sizes to balance disk space versus ingestion speed versus query latency.

For a full list of available functions and examples, refer to the Taq documentation on GitHub.