Machine learning¶
Machine-learning capabilities are at the heart of future technology development at KX.
Our libraries are released under the Apache 2 license, and are free for all use cases, including 64-bit and commercial use.
Machine Learning Toolkit¶
The Machine Learning Toolkit is at the core of our machine-learning functionality. This library contains functions that cover the following areas.
- Accuracy metrics to test the performance of constructed machine-learning models.
- Pre-processing data prior to the application of machine-learning algorithms.
- An implementation of the FRESH algorithm for feature extraction and selection on structured time series data.
- Utility functions which are useful in many machine-learning applications but do not fall within the other sections of the toolkit.
- Cross-Validation functions, used to verify how robust and stable a machine-learning model is to changes in the data being interrogated and the volume of this data.
- Clustering algorithms used to group data points and to identify patterns in their distributions. The algorithms make use of a k-dimensional tree to store points and scoring functions to analyze how well they performed.
Example notebooks¶
Example notebooks show FRESH and various aspects of toolkit functionality.
Natural Language Processing¶
NLP manages the common functions associated with processing unstructured text. Functions for searching, clustering, keyword extraction and sentiment are included in the library.
Automated Machine Learning¶
AutoML is a framework to automate the process of machine learning using kdb+. This is build largely on the machine learning toolkit and handles the following aspects of a traditional machine-learning pipeline:
- Data preprocessing
- Feature engineering and feature selection
- Model selection
- Hyperparameter tuning
- Report generation and model persistence
embedPy¶
EmbedPy loads Python into kdb+/q, allowing access to a rich ecosystem of libraries such as scikit-learn, tensorflow and pytorch.
- Python variables and objects become q variables – and either language can act upon them.
- Python code and files can be embedded within q code.
- Python functions can be called as q functions.
Example notebooks using embedPy
JupyterQ¶
JupyterQ supports Jupyter notebooks for q, providing
- Syntax highlighting, code completion and help
- Multiline input (script-like execution)
- Inline display of charts
Technical papers¶
- NASA FDL: Analyzing social media data for disaster management
Conor McCarthy, 2019.10 - NASA FDL: Predicting floods with q and machine learning
Diane O’Donoghue, 2019.10 - An introduction to neural networks with kdb+
James Neill, 2019.07 - NASA FDL: Exoplanets Challenge
Esperanza López Aguilera, 2018.12 - NASA FDL: Space Weather Challenge
Deanna Morgan, 2018.11 - Using embedPy to apply LASSO regression
Samantha Gallagher, 2018.10 - K-Nearest Neighbor classification and pattern recognition with q
Emanuele Melis, 2017.07
The KX machine-learning libraries are:
- well documented, with understandable and useful examples
- maintained and supported by KX on a best-efforts basis, at no cost to customers
- released under the Apache 2 license
- free for all use cases, including 64-bit and commercial use
Commercial support is available if required: please email sales@kx.com.