Parquet Overview in KDB-X

This page provides a high-level overview of Parquet support in KDB-X. It explains why Parquet is valuable within KDB-X, outlines the main benefits, and describes common scenarios where it applies. Use this page as a starting point before diving into detailed concepts, architecture, and limitations in the Introduction.

Parquet is a columnar storage format designed for efficient storage and retrieval. KDB-X supports reading Parquet files through the pq module, making it easy to query large datasets and interoperate with other data platforms.

Why Parquet with KDB-X?

Fast analytics at scale: Query large Parquet datasets efficiently with row group pruning and virtual tables.
Interoperability: Exchange data seamlessly with ecosystems like Spark, Pandas, Hive, or Arrow.
Reduced storage costs: Take advantage of columnar compression (for example, snappy, zstd) while keeping data queryable.
Seamless integration: Use q or SQL queries directly on Parquet files, alongside in-memory or partitioned tables.

Use cases

Data interchange: Share Parquet datasets between KDB-X and tools like Spark, Pandas, and Hive.
Efficient analytics: Run SQL or q queries directly against Parquet files with row group pruning.
Archival storage: Keep large historical datasets compressed but queryable.
Hybrid queries: Join or aggregate across in-memory tables and Parquet-backed virtual tables in one query.

Next steps

Explore the Parquet Introduction for practical examples of reading, querying, and combining Parquet files in KDB-X.