Automated Machine Learning in kdb+/q
The automated machine-learning platform described here is built largely on the tools available within the Machine Learning Toolkit. The purpose of this platform is to help automate the process of applying machine-learning techniques to real-world problems. In the absence of expert machine-learning engineers this handles the following processes within a traditional workflow.
- Data preprocessing
- Feature extraction and selection
- Model selection
- Model optimization
- Report generation and model persistence
Each of these steps is described in detail throughout this documentation. This allows users to understand the processes by which decisions are being made and the transformations which their data undergo during the production of the output models.
At present the machine-learning frameworks supported for this are based on:
- One-to-one feature to target non-timeseries
- FRESH-based feature extraction and model production
- NLP-based feature creation and word2vec transformation
Over time, the functionality available and the problems which can be solved using this library will be extended to include:
- Timeseries use cases and architectures
- Broader workflow flexibility
- More detailed outputs describing the steps taken
This should not necessarily be seen as a replacement to commercially available automated machine learning platforms. The work outlined here is intended to allow kdb+ users to explore the use of machine learning on their data and highlight automation techniques which can be deployed through kdb+ for various workflows.
This platform is currently in beta and feedback on the interface is requested. Please write to email@example.com.
The following requirements cover all those needed to run the libraries in the current build of the toolkit.
A number of Python dependencies also exist for the running of embedPy functions within both the machine learning utilities and FRESH libraries. These can be installed as outlined at:
KxSystems/ml using Pip
pip install -r requirements.txt
The requirements needed in order to use the NLP functionality can be found on Github.
conda install --file requirements.txt
Installation and loading
Install and load all libraries.
q)\l automl/automl.q q).automl.loadfile`:init.q
This can be achieved with one command. Assuming the
automl repository is located in
QHOME, execute the following from a q session: