Skip to content

Machine Learning Toolkit

Models, preprocessing functionality, and model evaluation methods

The Machine Learning Toolkit (ML-Toolkit) is at the core of kdb+/q machine-learning functionality. It describes the open-source libraries and scripts that let you apply machine-learning models, preprocessing techniques, and scoring functionality to a wide variety of kdb+ datasets.

Machine Learning Toolkit
ML Toolkit documentation

For easier use within the Insights analytics functionality, some of the sections below have been ‘wrapped’ as an experimental API.

The ML-Toolkit provides

  • Clustering algorithms to group data points and identify patterns in their distributions. The algorithms make use of a k-dimensional tree to store points and scoring functions to analyze how well they performed.
  • An implementation of the FRESH algorithm in q. This lets a q user perform feature-extraction and feature-significance tests on structured time-series data for forecasting, regression and classification.
  • Implementations of a number of cross-validation and hyperparameter search procedures. These allow q users to validate the performance of machine-learning models when exposed to new data, test the stability of models over time, or find the best hyperparameters for tuning their models.
  • Various implementations of timeseries models for q including but not limited to ARMA, ARIMA, SARIMA and ARCH. These allow q users to predict the future value of datasets based on historical observations and to measure statistical properties of future data.
  • A number of statistical algorithms allow users retrieve information about the contents of their data and to build regression algorithms such as Ordinary Least Squares and Weighted Least Squares.
  • Miscellaneous utilities including but not limited to model metrics, data manipulation and preprocessing functions.