Skip to content

Grammar of Graphics in q

The documentation here serves as a brief introduction to the scripted visualization library called GG.

image1.png

For much more detailed information, refer to the .qp and .gg help pages under Help > Developer Function Reference. The Grammar of Graphics visualization library comes in two flavors:

  1. as a q library
  2. as a standalone DSL (domain specific language)

The former is discussed here and presented in further detail in Developer Function Reference. The latter is presented under gg > dsl in the Developer Function Reference, and can be used in files that end in .gg within Developer, or within the Graphics Editor in the Transformer (Kx Analyst only).

The .qp and .gg module families provide data visualization capabilities. The public interface provides a grammar for specifying how plots should appear, based largely on the idea of mapping data variables (columns) to positional and aesthetic properties (i.e. x='City', y='Population', fill='Year'). In general, .qp defines the set of verbs, and .gg the objects.

A basic specification consists of a layer. A layer is a single set of mappings from variables to properties, together with some data and visual objects. For example, given a table tab, a simple scatter plot could be created with a single layer containing mappings from two columns in tab to x and y positional properties, the scales (axes) and a coordinate system that would be used when displaying the layer, and the data (tab) itself. This would define a complete specification that can be rendered into a scatter plot.

More advanced specifications can be created by stacking individual layer specifications to create a single new specification. When displayed, the stack of layers will all be rendered onto the same set of axes in the same coordinate system.

Building upon the expressiveness of the Grammar of Graphics, verbs have been added to the grammar for specifying the arrangement of disjoint specifications in order to create a new arranged specification. Any number of specifications can be arranged vertically or horizontally to create new specifications. These new specifications can also be arranged. The only limitation is that arrangements cannot be used within a stack, but a stack can appear in any arrangement. Also, stacks can be stacked themselves.

Basic visualization

The most basic way to visualize data is to use the high-level .qp.plot[…] API that takes a table, a list of columns to plot, and a dictionary of settings (or generic null).

t:([]x:5 * til 45; y: til 45; z: 45?`a`b`c)

// A 500px wide by 500px high plot of all columns of t
.qp.go[500;500] .qp.plot[t; (); ::]

// A plot of only column x
.qp.go[500;500] .qp.plot[t; `x; ::]

// A plot of x by y
.qp.go[500;500] .qp.plot[t; `x`y; ::]

// A plot of x, y, AND z
.qp.go[500;500] .qp.plot[t; `x`y`z; ::]

In these examples, the .qp.plot[…] section creates a specification – a plot description. The specification is provided to .qp.go[width; height; spec], which actually renders the plot description and sends it to the IDE environment. Here are a few example code snippets and their respective plots:

In the example below, we create a bar using this code:

t:([] x:5 * til 45; y: 45?til 10; z: 45?`a`b`c)
.qp.go[500;500] .qp.bar[t;`x;`y; ::]

image1.png

In the example below, we create a point or scatter using this code:

t:([] x:5 * til 45; y: 45?til 10; z: 45?`a`b`c)
.qp.go[500;500] .qp.point[t;`x;`y; ::]

image2.png

In the example below, we create a boxplot using this code:

t:([] x:5 * til 45; y: 45?til 10; z: 45?`a`b`c)
.qp.go[500;500] .qp.boxplot[t;`z;`x; ::]

image3.png

This is not the only way to create plot specifications. Very customized plots can be described by using the Grammar of Graphics rather than the .qp.plot facility.

Plot specification

A plot is created from an arrangement of stacks of layers.

Layers

At the most basic level, a single layer can be a plot. A layer is a collection of the following properties:

  • Data
  • Statistical transform (optional)
  • Geometry
  • Aesthetic mappings
  • Scales
  • Coordinate system

The data is the table we want to visualize. A statistical transform can be run on the data before visualization to transform the data in any specified way (the default transform is the identity, applying no transformation).

The geometry is the visual mark that will be made for each data record. There are a number of geometries available (Creating a layer), such as point, line, rect, etc.

A set of aesthetic mappings map variables from the result of the statistical transform to attributes of the plot. Each geometry has a set of required mappings and a set of optional mappings. For example, a point geometry requires x and y positions to be specified. Optionally, when using a point, the fill colour, the stroke colour, the alpha, and the size can also be mapped.

t:([]price:1 2 3; volume:9 8 7; sym:`a`b`c)

For example, if t above is the data for a layer, a possible aesthetic mapping using a point geometry could be (x='price', y='volume', fill='sym').

The scales govern the mapping between data variables and aesthetic properties. There are positional and aesthetic scales, for positional and aesthetic properties respectively. Positional properties can have scales such as linear, log, power, etc. Aesthetic scales can be gradient, circle radius, line size, etc.

Finally, a coordinate system for the mapping to occur in must be present in the layer as well. By default, the coordinate system is assumed to be Rectangular.

Creating a layer

In .qp, each geometry is a function. The following geometries are available:

.qp.histogram [data; …]
.qp.line [data; …]
.qp.hbar [data; …]
.qp.hhistogram [data; …]
.qp.path [data; …]
.qp.segment [data; …]
.qp.interval [data; …]
.qp.hinterval [data; …]
.qp.quantile [data; …]
.qp.rect [data; …]
.qp.text [data; …]
.qp.area [data; …]
.qp.bar [data; …]
.qp.ribbon [data; …]
.qp.boxplot [data; …]
.qp.hboxplot [data; …]
.qp.polygon [data; …]
.qp.heatmap [data; …]
.qp.tile [data; …]
.qp.smooth [data; …]
.qp.point [data; …]
.qp.hline [data; …]
.qp.vline [data; …]

Each of these is a function where the first argument is the data to be visualized. The following arguments change based on the geometry, and are the columns to map to the required aesthetic mappings for the given geometry. For example, a point geometry requires an x and y position, so the signature for creating a layer with a point geometry is:

.qp.point[t; `price; `volume; ::]

That last argument is a slot for options and customizations. Passing in generic null will create a basic layer. Every geometry has this same last argument.

Customizing a layer

The basic plot can be customized by joining options in place of the last argument. The options are all in the .qp.s sub-namespace. For example, to add a new fill mapping with an associated scale:

.qp.point[t; `price; `volume]
    .qp.s.aes   [`fill; `sym]
  , .qp.s.scale [`fill; .gg.scale.colour.cat10]

The generic null is omitted, and a list of joined options are passed instead. The first is a new aesthetic mapping using .qp.s.aes[…], which takes the aesthetic being mapped to as a symbol, followed by the column name. This is joined with a new scale governing the fill aesthetic mapping, using .qp.s.scale[…]. The scale provided here .gg.scale.colour.cat10 is one of the options for categorical color scales. It defines 10 distinct colors to map to distinct symbols in the data. Other options can be added by joining them in the same way.

There are a number of settings available, with more information on each in Help > Developer Function Reference:

.qp.s.aes[aesthetic; column]     / Add a new aesthetic mapping (column to attribute);
                                 /   accompany with suitable scale.
.qp.s.scale[aesthetic; scale]    / Add a new scale governing the aesthetic mapping.
.qp.s.aggr[aggregation]          / Register an aggregation for heatmaps, histograms
.qp.s.geom[settings]             / Geometry-specific setting dictionary
.qp.s.labels[labels]             / Labels for x, y, fill, color, alpha
.qp.s.theme[theme]               / Apply a new theme
.qp.s.stat[statTransform]        / Add or change the statistical transform function
.qp.s.binx[d; s; p]              / Change X-bin settings for heatmap, histogram
.qp.s.biny[d; s; p]              / Change Y-bin settings for heatmap, histogram
.qp.s.secondary[label]           / Specify layer depends on another in the same frame
.qp.s.primary[label]             / Specify layer can distribute data
                                 /   to other layers in the same frame
.qp.s.link[label]                / Register a two-way dependency between this layer
                                 /   and another in a separate frame
.qp.s.textalign[alignment]       / Change the text alignment for text geometries
.qp.s.legend[title; legend]
.qp.s.coord[coords]              / Change the coordinate system of the frame

Positional scales:

.gg.scale.default / transform into a default scale for the given data type
.gg.scale.linear
.gg.scale.log
.gg.scale.power[degree]
.gg.scale.categorical[sortFunction]
.gg.scale.date
.gg.scale.datetime
.gg.scale.minute
.gg.scale.month
.gg.scale.second
.gg.scale.time
.gg.scale.timespan
.gg.scale.timestamp
.gg.scale.weekday

Aesthetic scales:

.gg.scale.colour.cat10                  / discrete color scale of 10 colors
.gg.scale.colour.cat20                  / discrete color scale of 20 colors
.gg.scale.colour.cat[colors]            / discrete color scale of the given colors
.gg.scale.gradient[start;end]
.gg.scale.gradient2[middleValue;start;middle;end]
.gg.scale.alpha[min;max]                / alpha (opacity) scale
.gg.scale.circle.area[min;max]
.gg.scale.circle.radius[min;max]
.gg.scale.line.size[min;max]

Coordinate systems:

.gg.coords.rect                         / rectangular/Cartesian coordinates
.gg.coords.polar                        / polar coordinates

More detail on all of these can be found in their respective QDoc pages.

Stacking layers

Multiple layers can be stacked together to create more interesting plots. For example, if there are two tables to visualize:

tableA : ([]a: 1 2 3; b: 4 5 6)
tableB : ([]a: 9 8 7; b: 6 5 4)

and a layer for each table:

.qp.point[tableA; `a; `b; ::]
.qp.line[tableB; `a; `b; ::]

both layers could be rendered on the same axes by stacking with

.qp.stack (…):

.qp.stack (
    .qp.point[tableA; `a; `b; ::];
    .qp.line[tableB; `a; `b; ::]
)

The positional (X and Y) scales and the coordinate system of the first layer in the stack will be inherited by all other layers. For example, if both layers in the above stack should be plotted in a log-log polar plot, it is sufficient to update only the first specification:

.qp.stack (
    .qp.point[tableA; `a; `b]
        .qp.s.scale [`x; .gg.scale.log]
      , .qp.s.scale [`y; .gg.scale.log]
      , .qp.s.coord [.gg.coords.polar];
    .qp.line[tableB; `a; `b; ::]
)

These stacks can be used to create data-specific visualization, such as the following depiction of flights, which is a stack of polygon, segment, and point geometries:

image4.png

The specification for the above depiction of airports and flights would be:

.qp.stack (
    .qp.polygon[world; `lon; `lat; ::];
    .qp.point[airports; `lon; `lat; ::];
    .qp.segment[flights; `slon; `slat; `dlon; `dlat; ::]
)

This plot also makes use of one of the pre-built color themes, e.g. .gg.theme.dark. See the section below regarding themes.

Arranging layers

Once multiple plots have been constructed, it is possible to arrange the individual plots in a single visual display. Both of the plots above could be laid out horizontally with:

.qp.horizontal (
    .qp.point[tableA; `a; `b; ::];
    .qp.line[tableB; `a; `b; ::]
)

or vertically with:

.qp.vertical (
    .qp.point[tableA; `a; `b; ::];
    .qp.line[tableB; `a; `b; ::]
)

Arrangements can be arranged as well, so more complicated arrangements can be constructed by layering the arrangements:

.qp.vertical (
    .qp.point[tableA; `a; `b; ::];
    .qp.horizontal (
        .qp.line[tableB; `a; `b; ::];
        .qp.path[tableB; `a; `b; ::]
    )
)

Arrangements can be used to create effective summaries of data. Coupled with dependency specifications, arrangements can also be very effective in data exploration.

image5.png Summary of all data along multiple columns

Interaction

The images produced are interactive. Points can be interrogated by clicking the image. A table of matching records will appear under the image. One such table will appear for every layer in the plot clicked. (Independently-arranged visuals do not contribute).

A plot can also be zoomed by pressing Ctrl+Click (Windows and Linux) or ⌘Click (macOS) and dragging a region within the image in a single plot. The first point must be within the plot axes. The two points of the drag (start and end) define the region to zoom into. After releasing the mouse button, a new image will be drawn and will appear as a new tab.

Specifying dependencies

Two sorts of dependencies exist within .qp. The first is between layers in independent frames within an arrangement. Consider the following specification:

t : ([]x:5 * til 45; y: til 45; z: 45?`a`b`c);

.qp.vertical (
    .qp.point[t; `x; `y]
        .qp.s.link[`myid];
    .qp.line[t; `z; `x]
        .qp.s.link[`myid])

In the above, there are two layers which would render beside each other horizontally. Both layers link the same identifier (myid). Because of this, whenever one of the layers is drilled into, the other linked layers will render the same subset of the data as the layer that was interrogated.

The other concept of a dependency exists within a single frame. This concept is useful when a stack of several layers exists where one or more of the layers are really a function of another layer, as is the case between a scatterplot and a scatterplot smooth (a line drawn through the scatterplot). This is depicted in the in the following:

.qp.stack (
    .qp.point[t; `x; `y]
        .qp.s.primary[`myid];
    .qp.smooth[t; `x; `y; ::]
        .qp.s.secondary[`myid])

In this example, the scatterplot smooth is a secondary layer, and the scatter is a primary layer. Since these use the same identifier, whenever the frame is drilled into, only the scatter will be drilled into, and the smooth will be given the drilled scatter data so that it is always in sync.

Rotating aesthetics

Rather than zooming into the same axes, it is often useful to switch axes during a drilldown. For example, in the case of a bar chart with a categorical column, whenever the user drills into a category, the result can show the subcategories of the first category by mapping the subcategory column in the second plot. This can be repeated for however many subcategories exist. This is done simply by specifying a list of columns for a single axis as in the following:

.qp.histogram[sales; `region`province`category`subcategory; ::]

In the first rendering, a histogram of region will be displayed. Drilling into any single region will result in a histogram of the provinces in that region. Further drilling into a province will display a histogram of the categories in the province.