Grammar of Graphics in q
The documentation here serves as a brief introduction to the scripted visualization library for kdb+ called GG.
For much more detailed information, refer to the .qp
and .gg
API references.
The .qp
and .gg
module families provide data visualization capabilities. The public interface
provides a grammar for specifying how plots should appear, based largely on the idea of mapping
data variables (columns) to positional and aesthetic properties (i.e. x='City'
,
y='Population'
, fill='Year'
). In general, .qp
defines the set of verbs, and .gg
the objects.
A basic specification consists of a layer or geometry. A layer is a single set of mappings from variables to properties, together with some data and visual objects. All geometry functions take their positional properties as arguments, and aesthetic properties as optional settings:
.qp.go[500; 500] .qp.scatter[table; `col1; `col2] // <- table and positional properties
.qp.s.aes[`fill; `col3] // <- aethetic fill property
More advanced specifications can be created by stacking individual layer specifications to create a single new specification. When displayed, the stack of layers will all be rendered onto the same set of axes in the same coordinate system.
Building upon the expressiveness of the Grammar of Graphics, verbs have been added to the grammar for specifying the arrangement of disjoint specifications in order to create a new arranged specification. Any number of specifications can be arranged vertically or horizontally to create new specifications. These new specifications can also be arranged. The only limitation is that arrangements cannot be used within a stack, but a stack can appear in any arrangement. Note, stacks can be stacked themselves.
Basic visualization
The most basic way to visualize data is to use the high-level .qp.plot[…]
API that takes a table,
a list of columns to plot, and a dictionary of settings (or generic null).
t:([]x:5 * til 45; y: til 45; z: 45?`a`b`c)
// A 500px wide by 500px high plot of all columns of t
.qp.go[500;500] .qp.plot[t; (); ::]
// A plot of only column x
.qp.go[500;500] .qp.plot[t; `x; ::]
// A plot of x by y
.qp.go[500;500] .qp.plot[t; `x`y; ::]
// A plot of x, y, AND z
.qp.go[500;500] .qp.plot[t; `x`y`z; ::]
In these examples, the .qp.plot[…]
section creates a specification – a plot description. The
specification is provided to .qp.go[width; height; spec]
, which actually renders the plot
description and sends it to the IDE environment. Here are a few example code snippets and
their respective plots:
In the example below, we create a bar using this code:
t:([] x:5 * til 45; y: 45?til 10; z: 45?`a`b`c)
.qp.go[500;500] .qp.bar[t;`x;`y; ::]
In the example below, we create a point or scatter using this code:
t:([] x:5 * til 45; y: 45?til 10; z: 45?`a`b`c)
.qp.go[500;500] .qp.point[t;`x;`y; ::]
In the example below, we create a boxplot using this code:
t:([] x:5 * til 45; y: 45?til 10; z: 45?`a`b`c)
.qp.go[500;500] .qp.boxplot[t;`z;`x; ::]
This is not the only way to create plot specifications. Very customized plots
can be described by using the Grammar of Graphics rather than the
.qp.plot
facility.
Plot specification
A plot is created from an arrangement of stacks of layers.
Layers
At the most basic level, a single layer can be a plot. A layer is a collection of the following properties:
- Data
- Statistical transform (optional)
- Geometry
- Aesthetic mappings
- Scales
- Coordinate system
The data is the table we want to visualize. A statistical transform can be run on the data before visualization to transform the data in any specified way (the default transform is the identity, applying no transformation).
The geometry is the visual mark that will be made for each data record.
There are a number of geometries available (Creating a
layer), such as point
, line
, rect
, etc.
A set of aesthetic mappings map variables from the result of the
statistical transform to attributes of the plot. Each geometry has a set
of required mappings and a set of optional mappings. For example, a
point
geometry requires x
and y
positions to be specified.
Optionally, when using a point
, the fill
colour, the stroke
colour, the alpha
, and the size
can also be mapped.
t:([]price:1 2 3; volume:9 8 7; sym:`a`b`c)
For example, if t
above is the data for a layer, a possible aesthetic
mapping using a point
geometry could be (x='price', y='volume',
fill='sym')
.
The scales govern the mapping between data variables and aesthetic
properties. There are positional and aesthetic scales, for
positional and aesthetic properties respectively. Positional properties
can have scales such as linear
, log
, power
, etc. Aesthetic scales
can be gradient
, circle radius
, line size
, etc.
Finally, a coordinate system for the mapping to occur in must be present in the layer as well. By default, the coordinate system is assumed to be Rectangular.
Creating a layer
In .qp
, each geometry is a function. See the
geometries API references for
more information.
The first argument is the data to be
visualized. The following arguments change based on the geometry, and
are the positional columns for the given geometry. For example, a point
geometry requires an x
and y
position, so the signature for creating a layer with a point
geometry
is:
.qp.point[t; `price; `volume; ::]
That last argument is a slot for options and customizations. Passing in
generic null (::
) will create a basic layer. Every geometry has this same
last argument.
Customizing a layer
The basic plot can be customized by joining options in place of the
last argument. The options are all in the .qp.s
sub-namespace. For
example, to add a new fill
mapping with an associated scale:
.qp.point[t; `price; `volume]
.qp.s.aes [`fill; `sym]
, .qp.s.scale [`fill; .gg.scale.colour.cat10]
The generic null is omitted, and a list of joined options are passed
instead. The first is a new aesthetic mapping using .qp.s.aes[…]
,
which takes the aesthetic being mapped (fill) as a symbol, followed by a
column name. This is joined with a new scale governing the fill
aesthetic mapping, using .qp.s.scale[…]
. The scale provided here
.gg.scale.colour.cat10
is one of the options for categorical color
scales. It defines 10 distinct colors to map to distinct symbols in the
data. Other options can be added by joining them in the same way.
There are a number of settings available. Refer to the layer settings API reference for more information.
Stacking layers
Multiple layers can be stacked together to create more interesting plots. For example, if there are two tables to visualize:
tableA : ([]a: 1 2 3; b: 4 5 6)
tableB : ([]a: 9 8 7; b: 6 5 4)
and a layer for each table:
.qp.point[tableA; `a; `b; ::]
.qp.line[tableB; `a; `b; ::]
both layers could be rendered on the same axes by stacking with
.qp.stack (…)
:
.qp.stack (
.qp.point[tableA; `a; `b; ::];
.qp.line[tableB; `a; `b; ::]
)
The positional (X and Y) scales and the coordinate system of the first layer in the stack will be inherited by all other layers. For example, if both layers in the above stack should be plotted in a log-log polar plot, it is sufficient to update only the first specification:
.qp.stack (
.qp.point[tableA; `a; `b]
.qp.s.scale [`x; .gg.scale.log]
, .qp.s.scale [`y; .gg.scale.log]
, .qp.s.coord [.gg.coords.polar];
.qp.line[tableB; `a; `b; ::]
)
These stacks can be used to create data-specific visualization, such as the following depiction of flights, which is a stack of polygon, segment, and point geometries:
The specification for the above depiction of airports and flights would be:
.qp.stack (
.qp.polygon[world; `lon; `lat; ::];
.qp.point[airports; `lon; `lat; ::];
.qp.segment[flights; `slon; `slat; `dlon; `dlat; ::]
)
This plot also makes use of one of the pre-built color themes, e.g. .gg.theme.dark
. See the
section below regarding themes.
Arranging layers
Once multiple plots have been constructed, it is possible to arrange the individual plots in a single visual display. Both of the plots above could be laid out horizontally with:
.qp.horizontal (
.qp.point[tableA; `a; `b; ::];
.qp.line[tableB; `a; `b; ::]
)
or vertically with:
.qp.vertical (
.qp.point[tableA; `a; `b; ::];
.qp.line[tableB; `a; `b; ::]
)
Arrangements can be arranged as well, so more complicated arrangements can be constructed by layering the arrangements:
.qp.vertical (
.qp.point[tableA; `a; `b; ::];
.qp.horizontal (
.qp.line[tableB; `a; `b; ::];
.qp.path[tableB; `a; `b; ::]
)
)
Arrangements can be used to create effective summaries of data. Coupled with dependency specifications, arrangements can also be very effective in data exploration.
Summary of all data along multiple columns
Interaction
The images produced are interactive. Points can be interrogated by clicking the image. A table of matching records will appear under the image. One such table will appear for every layer in the plot clicked. (Independently-arranged visuals do not contribute).
A plot can also be zoomed by pressing Ctrl+Click (Windows and Linux) or ⌘Click (macOS) and dragging a region within the image in a single plot. The first point must be within the plot axes. The two points of the drag (start and end) define the region to zoom into. After releasing the mouse button, a new image will be drawn and will appear as a new tab.
Specifying dependencies
Two sorts of dependencies exist within .qp
. The first is between
layers in independent frames within an arrangement. Consider the
following specification:
t : ([]x:5 * til 45; y: til 45; z: 45?`a`b`c);
.qp.vertical (
.qp.point[t; `x; `y]
.qp.s.link[`myid];
.qp.line[t; `z; `x]
.qp.s.link[`myid])
In the above, there are two layers which would render beside each other
horizontally. Both layers link the same identifier (myid
). Because of
this, whenever one of the layers is drilled into, the other linked
layers will render the same subset of the data as the layer that was
interrogated.
The other concept of a dependency exists within a single frame. This concept is useful when a stack of several layers exists where one or more of the layers are really a function of another layer, as is the case between a scatterplot and a scatterplot smooth (a line drawn through the scatterplot). This is depicted in the in the following:
.qp.stack (
.qp.point[t; `x; `y]
.qp.s.primary[`myid];
.qp.smooth[t; `x; `y; ::]
.qp.s.secondary[`myid])
In this example, the scatterplot smooth is a secondary layer, and the scatter is a primary layer. Since these use the same identifier, whenever the frame is drilled into, only the scatter will be drilled into, and the smooth will be given the drilled scatter data so that it is always in sync.