Parquet Overview in KDB-X
This page provides a high-level overview of Parquet support in KDB-X. It explains why Parquet is valuable within KDB-X, outlines the main benefits, and describes common scenarios where it applies. Use this page as a starting point before diving into detailed concepts, architecture, and limitations in the Introduction.
Parquet is a columnar storage format designed for efficient storage and retrieval. KDB-X supports reading Parquet files through the pq module, making it easy to query large datasets and interoperate with other data platforms.
Why Parquet with KDB-X?
- Fast analytics at scale: Query large Parquet datasets efficiently with row group pruning and virtual tables.
- Interoperability: Exchange data seamlessly with ecosystems like Spark, Pandas, Hive, or Arrow.
- Reduced storage costs: Take advantage of columnar compression (for example, snappy, zstd) while keeping data queryable.
- Seamless integration: Use q or SQL queries directly on Parquet files, alongside in-memory or partitioned tables.
Use cases
- Data interchange: Share Parquet datasets between KDB-X and tools like Spark, Pandas, and Hive.
- Efficient analytics: Run SQL or q queries directly against Parquet files with row group pruning.
- Archival storage: Keep large historical datasets compressed but queryable.
- Hybrid queries: Join or aggregate across in-memory tables and Parquet-backed virtual tables in one query.
Next steps
- Explore the Parquet Introduction for practical examples of reading, querying, and combining Parquet files in KDB-X.