Machine Learning Toolkit¶
Models, preprocessing functionality, and model evaluation methods
The Machine Learning Toolkit (ML-Toolkit) is at the core of kdb+/q machine-learning functionality. It describes the open-source libraries and scripts that let you apply machine-learning models, preprocessing techniques, and scoring functionality to a wide variety of kdb+ datasets.
For easier use within the Insights analytics functionality, some of the sections below have been ‘wrapped’ as an experimental API.
The ML-Toolkit provides
- Clustering algorithms to group data points and identify patterns in their distributions. The algorithms make use of a k-dimensional tree to store points and scoring functions to analyze how well they performed.
- An implementation of the FRESH algorithm in q. This lets a q user perform feature-extraction and feature-significance tests on structured time-series data for forecasting, regression and classification.
- Implementations of a number of cross-validation and hyperparameter search procedures. These allow q users to validate the performance of machine-learning models when exposed to new data, test the stability of models over time, or find the best hyperparameters for tuning their models.
- Various implementations of timeseries models for q including but not limited to ARMA, ARIMA, SARIMA and ARCH. These allow q users to predict the future value of datasets based on historical observations and to measure statistical properties of future data.
- A number of statistical algorithms allow users retrieve information about the contents of their data and to build regression algorithms such as Ordinary Least Squares and Weighted Least Squares.
- Miscellaneous utilities including but not limited to model metrics, data manipulation and preprocessing functions.