Grammar of Graphics in q
The documentation here serves as a brief introduction to the scripted visualization library called GG.
For much more detailed information, refer to the
.gg help pages
under Help > Analyst Function Reference. The Grammar of Graphics visualization library comes
in two flavors:
- as a q library
- as a standalone DSL (domain specific language)
The former is discussed here and presented in further detail in Analyst Function Reference.
The latter is presented under
gg > dsl in the
Analyst Function Reference, and can be used in files that end in
Analyst, or within the Graphics Editor in the Transformer (Kx Analyst only).
.gg module families provide data visualization capabilities. The public interface
provides a grammar for specifying how plots should appear, based largely on the idea of mapping
data variables (columns) to positional and aesthetic properties (i.e.
fill='Year'). In general,
.qp defines the set of verbs, and
.gg the objects.
A basic specification consists of a layer. A layer is a single set of mappings from variables to
properties, together with some data and visual objects. For example, given a table
tab, a simple
scatter plot could be created with a single layer containing mappings from two columns in
to x and y positional properties, the scales (axes) and a coordinate system that would be
used when displaying the layer, and the data (
tab) itself. This would define a complete
specification that can be rendered into a scatter plot.
More advanced specifications can be created by stacking individual layer specifications to create a single new specification. When displayed, the stack of layers will all be rendered onto the same set of axes in the same coordinate system.
Building upon the expressiveness of the Grammar of Graphics, verbs have been added to the grammar for specifying the arrangement of disjoint specifications in order to create a new arranged specification. Any number of specifications can be arranged vertically or horizontally to create new specifications. These new specifications can also be arranged. The only limitation is that arrangements cannot be used within a stack, but a stack can appear in any arrangement. Also, stacks can be stacked themselves.
The most basic way to visualize data is to use the high-level
.qp.plot[…] API that takes a table,
a list of columns to plot, and a dictionary of settings (or generic null).
t:(x:5 * til 45; y: til 45; z: 45?`a`b`c) // A 500px wide by 500px high plot of all columns of t .qp.go[500;500] .qp.plot[t; (); ::] // A plot of only column x .qp.go[500;500] .qp.plot[t; `x; ::] // A plot of x by y .qp.go[500;500] .qp.plot[t; `x`y; ::] // A plot of x, y, AND z .qp.go[500;500] .qp.plot[t; `x`y`z; ::]
In these examples, the
.qp.plot[…] section creates a specification – a plot description. The
specification is provided to
.qp.go[width; height; spec], which actually renders the plot
description and sends it to the IDE environment. Here are a few example code snippets and
their respective plots:
In the example below, we create a bar using this code:
t:( x:5 * til 45; y: 45?til 10; z: 45?`a`b`c) .qp.go[500;500] .qp.bar[t;`x;`y; ::]
In the example below, we create a point or scatter using this code:
t:( x:5 * til 45; y: 45?til 10; z: 45?`a`b`c) .qp.go[500;500] .qp.point[t;`x;`y; ::]
In the example below, we create a boxplot using this code:
t:( x:5 * til 45; y: 45?til 10; z: 45?`a`b`c) .qp.go[500;500] .qp.boxplot[t;`z;`x; ::]
This is not the only way to create plot specifications. Very customized plots
can be described by using the Grammar of Graphics rather than the
A plot is created from an arrangement of stacks of layers.
At the most basic level, a single layer can be a plot. A layer is a collection of the following properties:
- Statistical transform (optional)
- Aesthetic mappings
- Coordinate system
The data is the table we want to visualize. A statistical transform can be run on the data before visualization to transform the data in any specified way (the default transform is the identity, applying no transformation).
The geometry is the visual mark that will be made for each data record.
There are a number of geometries available (Creating a
layer), such as
A set of aesthetic mappings map variables from the result of the
statistical transform to attributes of the plot. Each geometry has a set
of required mappings and a set of optional mappings. For example, a
point geometry requires
y positions to be specified.
Optionally, when using a
fill colour, the
alpha, and the
size can also be mapped.
t:(price:1 2 3; volume:9 8 7; sym:`a`b`c)
For example, if
t above is the data for a layer, a possible aesthetic
mapping using a
point geometry could be
The scales govern the mapping between data variables and aesthetic
properties. There are positional and aesthetic scales, for
positional and aesthetic properties respectively. Positional properties
can have scales such as
power, etc. Aesthetic scales
line size, etc.
Finally, a coordinate system for the mapping to occur in must be present in the layer as well. By default, the coordinate system is assumed to be Rectangular.
Creating a layer
.qp, each geometry is a function. The following geometries are
.qp.histogram [data; …] .qp.line [data; …] .qp.hbar [data; …] .qp.hhistogram [data; …] .qp.path [data; …] .qp.segment [data; …] .qp.interval [data; …] .qp.hinterval [data; …] .qp.quantile [data; …] .qp.rect [data; …] .qp.text [data; …] .qp.area [data; …] .qp.bar [data; …] .qp.ribbon [data; …] .qp.boxplot [data; …] .qp.hboxplot [data; …] .qp.polygon [data; …] .qp.heatmap [data; …] .qp.tile [data; …] .qp.smooth [data; …] .qp.point [data; …] .qp.hline [data; …] .qp.vline [data; …]
Each of these is a function where the first argument is the data to be
visualized. The following arguments change based on the geometry, and
are the columns to map to the required aesthetic mappings for the given
geometry. For example, a
point geometry requires an
position, so the signature for creating a layer with a
.qp.point[t; `price; `volume; ::]
That last argument is a slot for options and customizations. Passing in generic null will create a basic layer. Every geometry has this same last argument.
Customizing a layer
The basic plot can be customized by joining options in place of the
last argument. The options are all in the
.qp.s sub-namespace. For
example, to add a new
fill mapping with an associated scale:
.qp.point[t; `price; `volume] .qp.s.aes [`fill; `sym] , .qp.s.scale [`fill; .gg.scale.colour.cat10]
The generic null is omitted, and a list of joined options are passed
instead. The first is a new aesthetic mapping using
which takes the aesthetic being mapped to as a symbol, followed by the
column name. This is joined with a new scale governing the fill
aesthetic mapping, using
.qp.s.scale[…]. The scale provided here
.gg.scale.colour.cat10 is one of the options for categorical color
scales. It defines 10 distinct colors to map to distinct symbols in the
data. Other options can be added by joining them in the same way.
There are a number of settings available, with more information on each in Help > Analyst Function Reference:
.qp.s.aes[aesthetic; column] / Add a new aesthetic mapping (column to attribute); / accompany with suitable scale. .qp.s.scale[aesthetic; scale] / Add a new scale governing the aesthetic mapping. .qp.s.aggr[aggregation] / Register an aggregation for heatmaps, histograms .qp.s.geom[settings] / Geometry-specific setting dictionary .qp.s.labels[labels] / Labels for x, y, fill, color, alpha .qp.s.theme[theme] / Apply a new theme .qp.s.stat[statTransform] / Add or change the statistical transform function .qp.s.binx[d; s; p] / Change X-bin settings for heatmap, histogram .qp.s.biny[d; s; p] / Change Y-bin settings for heatmap, histogram .qp.s.secondary[label] / Specify layer depends on another in the same frame .qp.s.primary[label] / Specify layer can distribute data / to other layers in the same frame .qp.s.link[label] / Register a two-way dependency between this layer / and another in a separate frame .qp.s.textalign[alignment] / Change the text alignment for text geometries .qp.s.legend[title; legend] .qp.s.coord[coords] / Change the coordinate system of the frame
.gg.scale.default / transform into a default scale for the given data type .gg.scale.linear .gg.scale.log .gg.scale.power[degree] .gg.scale.categorical[sortFunction] .gg.scale.date .gg.scale.datetime .gg.scale.minute .gg.scale.month .gg.scale.second .gg.scale.time .gg.scale.timespan .gg.scale.timestamp .gg.scale.weekday
.gg.scale.colour.cat10 / discrete color scale of 10 colors .gg.scale.colour.cat20 / discrete color scale of 20 colors .gg.scale.colour.cat[colors] / discrete color scale of the given colors .gg.scale.gradient[start;end] .gg.scale.gradient2[middleValue;start;middle;end] .gg.scale.alpha[min;max] / alpha (opacity) scale .gg.scale.circle.area[min;max] .gg.scale.circle.radius[min;max] .gg.scale.line.size[min;max]
.gg.coords.rect / rectangular/Cartesian coordinates .gg.coords.polar / polar coordinates
More detail on all of these can be found in their respective QDoc pages.
Multiple layers can be stacked together to create more interesting plots. For example, if there are two tables to visualize:
tableA : (a: 1 2 3; b: 4 5 6) tableB : (a: 9 8 7; b: 6 5 4)
and a layer for each table:
.qp.point[tableA; `a; `b; ::] .qp.line[tableB; `a; `b; ::]
both layers could be rendered on the same axes by stacking with
.qp.stack ( .qp.point[tableA; `a; `b; ::]; .qp.line[tableB; `a; `b; ::] )
The positional (X and Y) scales and the coordinate system of the first layer in the stack will be inherited by all other layers. For example, if both layers in the above stack should be plotted in a log-log polar plot, it is sufficient to update only the first specification:
.qp.stack ( .qp.point[tableA; `a; `b] .qp.s.scale [`x; .gg.scale.log] , .qp.s.scale [`y; .gg.scale.log] , .qp.s.coord [.gg.coords.polar]; .qp.line[tableB; `a; `b; ::] )
These stacks can be used to create data-specific visualization, such as the following depiction of flights, which is a stack of polygon, segment, and point geometries:
The specification for the above depiction of airports and flights would be:
.qp.stack ( .qp.polygon[world; `lon; `lat; ::]; .qp.point[airports; `lon; `lat; ::]; .qp.segment[flights; `slon; `slat; `dlon; `dlat; ::] )
This plot also makes use of one of the pre-built color themes, e.g.
.gg.theme.dark. See the
section below regarding themes.
Once multiple plots have been constructed, it is possible to arrange the individual plots in a single visual display. Both of the plots above could be laid out horizontally with:
.qp.horizontal ( .qp.point[tableA; `a; `b; ::]; .qp.line[tableB; `a; `b; ::] )
or vertically with:
.qp.vertical ( .qp.point[tableA; `a; `b; ::]; .qp.line[tableB; `a; `b; ::] )
Arrangements can be arranged as well, so more complicated arrangements can be constructed by layering the arrangements:
.qp.vertical ( .qp.point[tableA; `a; `b; ::]; .qp.horizontal ( .qp.line[tableB; `a; `b; ::]; .qp.path[tableB; `a; `b; ::] ) )
Arrangements can be used to create effective summaries of data. Coupled with dependency specifications, arrangements can also be very effective in data exploration.
Summary of all data along multiple columns
The images produced are interactive. Points can be interrogated by clicking the image. A table of matching records will appear under the image. One such table will appear for every layer in the plot clicked. (Independently-arranged visuals do not contribute).
A plot can also be zoomed by pressing Ctrl+Click (Windows and Linux) or ⌘Click (macOS) and dragging a region within the image in a single plot. The first point must be within the plot axes. The two points of the drag (start and end) define the region to zoom into. After releasing the mouse button, a new image will be drawn and will appear as a new tab.
Two sorts of dependencies exist within
.qp. The first is between
layers in independent frames within an arrangement. Consider the
t : (x:5 * til 45; y: til 45; z: 45?`a`b`c); .qp.vertical ( .qp.point[t; `x; `y] .qp.s.link[`myid]; .qp.line[t; `z; `x] .qp.s.link[`myid])
In the above, there are two layers which would render beside each other
horizontally. Both layers link the same identifier (
myid). Because of
this, whenever one of the layers is drilled into, the other linked
layers will render the same subset of the data as the layer that was
The other concept of a dependency exists within a single frame. This concept is useful when a stack of several layers exists where one or more of the layers are really a function of another layer, as is the case between a scatterplot and a scatterplot smooth (a line drawn through the scatterplot). This is depicted in the in the following:
.qp.stack ( .qp.point[t; `x; `y] .qp.s.primary[`myid]; .qp.smooth[t; `x; `y; ::] .qp.s.secondary[`myid])
In this example, the scatterplot smooth is a secondary layer, and the scatter is a primary layer. Since these use the same identifier, whenever the frame is drilled into, only the scatter will be drilled into, and the smooth will be given the drilled scatter data so that it is always in sync.
Rather than zooming into the same axes, it is often useful to switch axes during a drilldown. For example, in the case of a bar chart with a categorical column, whenever the user drills into a category, the result can show the subcategories of the first category by mapping the subcategory column in the second plot. This can be repeated for however many subcategories exist. This is done simply by specifying a list of columns for a single axis as in the following:
.qp.histogram[sales; `region`province`category`subcategory; ::]
In the first rendering, a histogram of
region will be displayed.
Drilling into any single region will result in a histogram of the
provinces in that region. Further drilling into a
display a histogram of the
categories in the