Datagen Module¶
This page explains what the Datagen module is and when to use it.
The KDB-X Datagen module provides configurable data generators that create reproducible in-memory and on-disk q tables across domains such as Capital Markets and DevOps. It enables users to simulate real-world scenarios and work with test datasets using q-SQL, SQL, and KDB-X Python.
Key features¶
-
Synthetic data: Generate q tables that reflect real-world schema patterns.
-
Configurable generation: Control data volume, characteristics, and domain-specific parameters.
-
Efficient q native output: Output tables can be analyzed with q-SQL, SQL, and KDB-X Python.
-
High-volume generation: Vectorized and loop-efficient q implementations enable testing with large datasets.
Typical use cases¶
Datagen is useful when you need sample data in place of production datasets:
- Learn q, q-SQL, and SQL using structured, ready-to-query datasets.
- Explore data analysis techniques without requiring access to live systems.
Next steps¶
To get started with Datagen in KDB-X:
- Install the Datagen module into your KDB-X environment.
- Pick an appropriate domain, for example, Capital Markets or DevOps.
- Run the data generation with the default parameters, or overwrite defaults (for example, increase data volume).
- Use the generated data in q analytics, SQL queries, or Python workflows.
For a full list of available domains, configuration options, and examples, refer to the Datagen documentation on GitHub.