The Transformer allows us to take a table that contains messy data, and refine it. It provides a set of high-level functionality for creating a workflow of data transformations in a visual environment. Workflows take source data and perform a series of operations to wrangle the data and produce an output. Once a workflow has been perfected it can be compiled into a q function that will run its operations on a set of input tables that have matching schemas to the original input tables.
Opening the Transformer
Pick Tools > Table Transformer from the Explorer main menu.
The transformer will appear in a new tab.
The Transformer allows you to create a workflow for quickly preparing data to a desired structure. A workflow is started from a Source node. A Source node can be added by picking New > Source from the Node menu. Adding a source will launch the Importer to configure a data source to be added. Once a data source is selected, clicking Finish will create the Source node.
Editing a node
Edit a node by picking Edit from its context menu. For example, edit a Source node to configure a different source to be imported.
Besides the Source nodes above there are also Action, Join, Function, Code Graphic, and Output nodes.
An Action node allows you to construct a sequence of data-preparation functions to transform the table. These operations can be selected from a list of available operations or can be configured with qSQL.
Adding transformation actions
Table nodes in a workflow can be transformed using transformation actions. Actions perform update operations on columns in a table. Actions can be picked from the context menu of a column in the target table.
or, from the Actions menu.
Transformation actions you can perform:
Update a column’s data value based on simple or complex and/or query operations. For example, here the analyst uses the point-and-click interface to add an action that sets the duration column to zero if the source column is equal to 300. Multiple columns can be updated at the same time and complex nested and/or qualifications can be expressed in the interface.
Create filtered queries against the dataset based on simple or complex AND/OR operations using a point-and-click interface.
- Add Columns
Add new columns. The value of the column can be any valid q statement (e.g. a custom function, a column name, a variety of standard aggregation operations).
- Transform Type
Change the type of any column to any standard q type.
The context menu also contains some commonly used shortcuts.
Parse Date and Times… allows you to parse almost any numeric value into dates.
Parse Numeric… allows you to parse strings or symbols into number formats.
- Transform Cells
Perform basic data-filling operations including replacing nulls with any value.
- Transform Column
Perform a wide range of column operations as shown in the context sub-menu below. Column attributes (e.g. sorted, keyed) can dramatically improve query performance. Expansion of column dictionaries and splitting columns on value separators are particularly useful for JSON, URL and Log File data.
- Transform Text
Perform basic text operations.
Actions can be edited by double-clicking an Action node or picking Edit from the context menu. Editing an Action node will open a dialog for that action and allow for properties of the action to be changed. Once editing is complete, pressing OK will update the current action and re-compute the current table.
Modifying action execution
Actions can be enabled or disabled by clicking the green check box on the action node.
Disabling an action shows the state of the table before the target action has been applied. This can be used to determine how an individual action is changing the table.
Clicking the trashcan deletes the action:
A red border indicates an error in the action:
Right-clicking opens the context menu in the action list:
Error reporting in the Action List and display
Actions that encounter an error will be highlighted in red (either the Action List or display area or both) and will lock the current table.
All errors must be fixed before more actions can be added to a table. Workflows that contain errors cannot be saved into a transformation function until all errors are fixed.
Join nodes can be used to merge data from multiple sources using a table join. Joins take two table sources and can be configured with one of many joins as listed below:
|Left||Result is the left table with any matching keys from the right table joined to it|
|Inner||Result has one combined record for each row in the left table that matches a row in the right table|
|Equi||Same behaviour as Inner join|
|Plus||Result is the arithmetic sum of matching records. Common columns are left unchanged and new columns are zero|
|Union||Matching records from the right table will update the left table. Unmatched records from both tables are inserted into the result|
|Upsert||Same behaviour as union except columns must be aligned to perform join|
|Insert||Records from the right table are appended to the left table. Columns must be aligned|
|Zip||Result matches records based on row index|
Function nodes allow arbitrary functions to be executed on a table within a workflow. A function name can be specified in the input field and will be invoked when the transform is executed. This function must the incoming table as a parameter and should return an updated version of the same table.
Code Graphic nodes
Code Graphic nodes allow adding visualization to a transform run. These visualizations can be used as a reporting mechanism by saving completed images to disk during the transformation. The graphics exploit the power of the Grammar of Graphics using a standalone visualization language, see Grammar of Graphics for more information. Code Graphics can also be used for mapping other nodes to visual geometries as well as global tables in the process.
Output nodes are the final step in a transformation workflow. These nodes accept a single table and store it in an external format to be used later.
Output nodes can be configured using the table Exporter, which provides the ability to configure a destination for a number of different formats. The configuration that is set up in the Output node will not be touched until the transformation is run.
When a change is made to a node in a workflow it will affect all of the tables that depend on it. Keep this in mind when adding actions to a table that has dependants. Changes to the parent table in a workflow may cause errors in other tables.
Saved and custom functions
The Transformer allows you to use common statistical aggregates (e.g. count, average, maximum, minimum, distinct, standard deviation, first, last) in the Value field. You can also use any custom function you wish to create. It is similar to a stored procedure in SQL but far more powerful.
In the example below, the user has created a custom function that takes two column values and performs a mathematical computation on the columns and returns the result.
The user then uses the function in the Value field, passing two other data columns to the function:
Once a workflow has been created it can be compiled into a transformation function.
Since only a single table can be returned from a function, the desired output table in the workflow must be selected. Once the expected output table is selected, from the File menu pick Save. This will compile and save a transformation function. When executed, the function will output the selected table. The compiled transformation function takes two arguments, the output and any inputs.
Transformation configurations store a table mutation sequence along with pre-defined inputs. These inputs can be overridden by passing new input configurations to this transform function.
|output||Output configuration. Pass generic null
|inputs||Inputs are stored within the transformation but can be overridden by supplying new sources. To use the pre-defined inputs, pass generic null
q).abc.myTransform[::] .im.io.with.target[`t] .im.io.create `kdb id name city .. ------------------------.. 0 a New York .. 1 b Dublin .. 2 c Ottawa .. ..