Parquet¶
This page explains what Parquet support in KDB-X is and when to use it.
Parquet is a columnar storage format designed for efficient storage and retrieval. KDB-X supports reading Parquet files through the pq module, making it easy to query large datasets and interoperate with other data platforms.
Why Parquet with KDB-X?¶
- Fast analytics at scale: Query large Parquet datasets efficiently with row group pruning and virtual tables.
- Interoperability: Exchange data seamlessly with ecosystems like Spark, Pandas, Hive, or Arrow.
- Reduced storage costs: Take advantage of columnar compression (for example, snappy, zstd) while keeping data queryable.
- Seamless integration: Use q or SQL queries directly on Parquet files, alongside in-memory or partitioned tables.
Use cases¶
- Data interchange: Share Parquet datasets between KDB-X and tools like Spark, Pandas, and Hive.
- Efficient analytics: Run SQL or q queries directly against Parquet files with row group pruning.
- Archival storage: Keep large historical datasets compressed but queryable.
- Hybrid queries: Join or aggregate across in-memory tables and Parquet-backed virtual tables in one query.
Next steps¶
- Explore the Parquet Introduction for practical examples of reading, querying, and combining Parquet files in KDB-X.