Datagen Module
This page provides an overview of the Datagen module, which generates synthetic q datasets for learning and testing KDB-X data analysis.
The KDB‑X Datagen module provides configurable data generators that create reproducible in‑memory and on‑disk q tables across domains such as Capital Markets and DevOps. It enables users to simulate real‑world scenarios and work with test datasets using q‑SQL, SQL, and KDB‑X Python.
Key features
-
Synthetic data: Generate q tables that reflect real-world schema patterns.
-
Configurable generation: Control data volume, characteristics, and domain-specific parameters.
-
Efficient q native output: Output tables can be analyzed with q-SQL, SQL, and KDB-X Python.
-
High-volume generation: Vectorized and loop-efficient q implementations enable testing with large datasets.
Typical use cases
Datagen is useful when you need sample data in place of production datasets:
- Learn q, q-SQL, and SQL using structured, ready-to-query datasets.
- Explore data analysis techniques without requiring access to live systems.
Next steps
To get started with Datagen in KDB-X:
- Install the Datagen module into your KDB-X environment.
- Pick an appropriate domain, for example, Capital Markets or DevOps.
- Run the data generation with the default parameters, or overwrite defaults (for example, increase data volume).
- Use the generated data in q analytics, SQL queries, or Python workflows.
For a full list of available domains, configuration options, and examples, refer to the Datagen documentation on GitHub.