Datagen Module

This page provides an overview of the Datagen module, which generates synthetic q datasets for learning and testing KDB-X data analysis.

The KDB‑X Datagen module provides configurable data generators that create reproducible in‑memory and on‑disk q tables across domains such as Capital Markets and DevOps. It enables users to simulate real‑world scenarios and work with test datasets using q‑SQL, SQL, and KDB‑X Python.

Key features

Synthetic data: Generate q tables that reflect real-world schema patterns.
Configurable generation: Control data volume, characteristics, and domain-specific parameters.
Efficient q native output: Output tables can be analyzed with q-SQL, SQL, and KDB-X Python.
High-volume generation: Vectorized and loop-efficient q implementations enable testing with large datasets.

Typical use cases

Datagen is useful when you need sample data in place of production datasets:

Learn q, q-SQL, and SQL using structured, ready-to-query datasets.
Explore data analysis techniques without requiring access to live systems.

Next steps

To get started with Datagen in KDB-X:

Install the Datagen module into your KDB-X environment.
Pick an appropriate domain, for example, Capital Markets or DevOps.
Run the data generation with the default parameters, or overwrite defaults (for example, increase data volume).
Use the generated data in q analytics, SQL queries, or Python workflows.

For a full list of available domains, configuration options, and examples, refer to the Datagen documentation on GitHub.