QuickCheck

QuickCheck is a property-based testing framework. This means that rather than testing individual cases, general properties of the system are tested. For example, given a function reverse, we might write unit tests such as:

3 2 1 ~ reverse 1 2 3
-3 -2 -1 ~ reverse -1 -2 -3

This function is clearly supposed to reverse its argument. Specific cases have been tested (numbers, specifically positive and negative). Perhaps cases have been forgotten, such as lists of tables, or the empty list. More generally, we want to test that this reverse function behaves as expected. Properties are defined using lambdas, where the arguments to the lambda represent a possible argument, and should return the boolean value true for success:

{ reverse[x,y] ~ reverse[y],reverse[x] }
{ x ~ reverse reverse x }

Then, the x and y arguments need to be scoped. These x and y arguments should represent all possible lists. We can specify that by writing the following:

.qch.forall2 [.qch.g.list .qch.g.int[]; .qch.g.list .qch.g.int[]] {
    reverse[x,y] ~ reverse[y],reverse[x] }

.qch.forall [.qch.g.list .qch.g.int[]] { x ~ reverse reverse x }

These are two fully-specified properties of the reverse function. In general, a property has three parts:

.qch.forall2 [.qch.g.list[];.qch.g.list[]] {reverse[x,y] ~ reverse[y],reverse[x]}
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    1                     2                                   3

.qch.forallN creates the property object. N specifies the number of arguments to the property. This property takes two arguments, so we use .qch.forall2. If only a single argument is used, we can use .qch.forall, as in the second property. Properties with up to seven arguments are supported.
Argument generators. These specify the values that the arguments can possibly be. There should be one generator for every argument to the property. The generators are semicolon delimited in square brackets. There is a very flexible generator library available (see Writing Generators).
Finally, the property is just a function as mentioned previously.

Skipping QuickCheck tests

Tests can be conditionally skipped from within the property simply by returning .qch.discard from the property. QuickCheck will succeed if all non-discarded tests pass.

.qch.check .qch.forall [.qch.g.int[]] {
    // Discard the test if negative
    if [x < 0; : .qch.discard];
    : x <= 0;
    }

Running QuickCheck tests

QuickCheck tests can be run two ways: standalone, or integrated within qCumber.

Running QuickCheck standalone

To run a QuickCheck test standalone, pass the entire QuickCheck property object to .qch.check. This will run the property on 100 (default) randomly-generated arguments. If a failure is found, QuickCheck stops immediately and reports the failure. .qch.check returns a basic dictionary of information about the test run. In particular, this dictionary has a success field that specifies whether the 100 tests passed, or if a failure was found. If a failure was found, the arguments that cause the failure will be kept in the failed field of the resulting dictionary.

q).qch.summary .qch.check .qch.forall2 [.qch.g.list[];.qch.g.list[]] { reverse[x,y] ~ reverse[y],reverse[x] }
OK, passed 100 tests.

If a failure occurs, the failure will be reported along with the number of tests that passed and the failed arguments:

q).qch.summary .qch.check .qch.forall [.qch.g.list .qch.g.int[10]] { not 5 in x };
Failed! Falsifiable (after 2 tests).
   Counter-example:
      [0]: 4 0 4 7 6 1 9 0 3 1 1 1 0 6 9 2 5i
   Shrunk (2 iters):
      [0]: ,5i

Above we are asserting that there is no list of integers less than five that contains a 5. QuickCheck fails after two tests and reports the list that caused the failure. QuickCheck then attempts to make a smaller failing test case out of the one that it found. It does this until it cannot find a smaller reproducible case. In the above, QuickCheck determined that the smallest failing test case for the assertion that no list contains a 5 is a list with a single element: 5.

Running QuickCheck with qCumber

QuickCheck can also be run from within qCumber. Simply place the QuickCheck specification into a property block within qCumber (at the same level as should blocks).

feature QuickCheck Example
    property no list of integers less than 10 contains a 5, clearly falsifiable
        .qch.check .qch.forall [.qch.g.list .qch.g.int[10]] { not 5 in x }

    property all lists have the same length, also clearly falsifiable
        .qch.check .qch.forall2 [.qch.g.list[]; .qch.g.list[]] { count[x] ~ count y }

Changing the run times

By default, QuickCheck checks each property 100 times with random arguments each time. You can change the number of tests globally by executing:

.qch.setTimes X

where X is an integer. For example, .qch.setTimes 10 will change the default from 100 runs to 10 runs.

Observing argument distributions

To get an idea of what is being generated by QuickCheck, custom classifiers can be used. A classifier is a function from a generated value to a classification string.

intensify : { x + 2 };

prop_intensify : { (x + 2) = intensify x };

// An optional classifier to get random distribution information
// on the generated data
classifier: { $[x>0; "positive"; x=0; "zero"; "negative"] };

.qch.summary .qch.check
    // Run this classifier on each input (optional)
    .qch.with.classifier[classifier]
    .qch.forall [.qch.g.int[]] prop_intensify;

Result:

OK, passed 100 tests.
   Classifications:
      54% negative
      46% positive

Writing generators

Random data is specified by building descriptions of the data using the QuickCheck generators. The generators are in the .qch.g module. These can be used independently of QuickCheck.

First, a random value can be extracted for a generator by reifying the generator:

.qch.g.reify generator

Base values

A generator exists for each base type (names are always lowercase).

Numbers

Numbers are generated using the Q first 1?x. If not specified, x is set to be the zero value of the correct type (generating random numbers from the entire domain of the type). If x is specified, then random numbers from 0 to x are generated of the correct type.

q).qch.g.reify .qch.g.int[]
1421176632i

q).qch.g.reify .qch.g.int[5i]
3i

Arbitrary numbers can be generated with .qch.g.number[] and arbitrary integers (short, int, or real) with .qch.g.integer[].

Ranges

For a range of numbers, the g.range.* functions exist:

q).qch.g.reify .qch.g.range.float[-20f; -15f]
-18.28338

Others

All other base types are g.*, and do not take an argument:

q).qch.g.reify .qch.g.timestamp[]
2002.11.07D00:58:59.949598160

q).qch.g.reify .qch.g.symbol[]
`pifidjm

Complex values

Vectors

The random primitive in q (left?right) is wrapped in a generator called .qch.g.vector[left; right]. This is much faster than generating large vectors with .qch.g.list (below) since it does not sample for each element. Large tables can be constructed by flipping a dictionary whose values are created with .qch.g.vector.

q).qch.g.reify .qch.g.vector[100000; 5j]
2 3 1 3 4 2 4 4 1 3 1 2 2 2 3 2 0 1 2 1 4 3 1 3 0 3 1 2 1 ...

q)flip .qch.g.reify .qch.g.dict `a`b!(.qch.g.vector[3; 0]; .qch.g.vector[3;0])
a                    b
----------------------------------------
-1688661736184075239 3838750325811358196
269882543935740396   2749155131475518269
1739421853105997842  8904078131295199521

Lists

Lists of arbitrary length or given length take a generator specifying the type of the list:

q)// An arbitrary-length list of booleans
q).qch.g.reify .qch.g.list .qch.g.boolean[]
1011001000b

q)// A length 5 list of anything
q).qch.g.reify .qch.g.listn[5] .qch.g.any[]
(2002.04.06;2;29105h;0b;-6565914219366623234)

Of course, these are themselves generators, so a list of 3x3 0-1 matrices can be created easily:

.qch.g.reify
    .qch.g.list
    .qch.g.listn[3]
    .qch.g.listn[3]
    .qch.g.oneof (.qch.g.const 0; .qch.g.const 1)

Result:

0 1 0   0 0 1
0 1 1   0 0 0
1 1 0   1 1 1

Tuples

Tuples are finite heterogeneous lists, described with the g.tuple generator, which takes a list of generators:

q).qch.g.reify .qch.g.tuple (.qch.g.int[]; .qch.g.guid[])
(2074000032i;94324512-1e59-beca-441b-3c12ba35faac)

Dictionaries

Dictionaries are generated with the g.dict generator. If given a dictionary of generators, it creates a dictionary of the same shape:

.qch.g.reify .qch.g.dict `a`b`c!(
    .qch.g.boolean[];
    .qch.g.guid[];
    .qch.g.oneof (.qch.g.list .qch.g.char[]; .qch.g.symbol[]))

Result:

a| 1b
b| fdfe2b7f-e821-b7d7-894e-cbb1200af295
c| "pwlkkki"

Otherwise, a random dictionary can be created without specifying an argument:

q).qch.g.reify .qch.g.dict[]
jcnhj| -964h
alhjk| 09:19
agjkg| 759652475045247869
jkhfi| "k"
ffjmj| `kdmn

Tables

Like dictionaries, tables can be created with a table argument (keyed or unkeyed), or with no argument for a random table.

.qch.g.reify .qch.g.table
    ([id:enlist .qch.g.guid[]] name:enlist .qch.g.list .qch.g.char[])

Result:

id                                  | name
------------------------------------| -------------------
3ca8d02d-5714-d124-6553-91c326b33ddb| "bmv"
c77656aa-64b9-f2b8-df67-585b9ba26c5f| "cdoddz"
f74ed316-d963-5f42-809f-d7b56cf39faa| "lfaxpvrz"

Changing the maximum size

The maximum size of the generated value (20 by default) can be configured on each generator. For example, if we want to create lists of length > 1234, we can specify the custom max length with:

q)count .qch.g.reify .qch.g.maxSize[1234] .qch.g.list .qch.g.int[]
984

Note however, that this only works on fully-expressed generators. Many generators are built in terms of other generators, in which case the inner generators will not have the local max size applied, such as in the following:

q)count .qch.g.reify .qch.g.maxSize[1234] .qch.g.list[]
12

If this is required, the global maxSize can be set manually, which will apply for all generators that haven’t had their maxSize property set manually.

q).qch.g.i.MAXSIZE : 1234

q)count .qch.g.reify .qch.g.maxSize[1234] .qch.g.list[]
693

Custom combinators (adding nulls, infinities, constants)

The customizers .qch.g.const, .qch.g.any, .qch.g.oneof, .qch.g.oneof_w, and .qch.g.integer allow more interesting structures to be built.

function	effect
`g.const`	turns a q value into a generator for use with any of the other generators. `.qch.g.reify` results in the given value.
`g.any`	returns a random base value each time it is reified.
`g.elements`	takes a list of values (not generators), and returns one of the values at random each time.
`g.oneof`	takes a list of generators, and returns a value of one of the generators at random each time.
`g.oneof_w`	takes a list of generators and a list of weights, and returns a value of one of the generators with probability defined by the weights. Lists with nulls and infinities can be created along a desired distribution, as shown below.

q).qch.g.reify .qch.g.list .qch.g.oneof_w[(.qch.g.int[]; .qch.g.const 0Ni); 10 1]
-116430526 1724339645 0N 680252841 -750478914 -1331497625i

Creating new generators

In some cases, the generators may not be enough to generate data of certain formats. For example, if testing a tree library with tree implemented as tables, the default g.table[] generator will generate nonsense trees. Instead, a new generator should be built for generating well-formed trees.

A generator is just a dictionary with three fields: arb, shrink, and less.

A new generator can be created with .qch.g.new by specifying the three behaviors (shrink and less are optional and should be null (::) if not used):

treeGen : .qch.g.new (
    // Generating function
    {[]
        tree : // Create a random tree
        : tree;
        };

    // Shrinking function or null (return a list of possible shrunk values)
    ::;

    // Less-Than function or null (define what 'size' means for the custom data)
    ::
  );

The shrinking function is given the current failing argument, and should return a list of possible smaller arguments. For example, the shrinking function for lists randomly samples the argument list and returns 20 new lists (a list of 20 lists). The less-than function takes two shrunk values (i.e., two sampled lists for the list generator), and returns true if the first is strictly smaller than the second – this defines what the notion of ‘size’ means and what should be minimized. The less-than function is used to sort the list of possibly shrunk values in order from smallest to largest so that QuickCheck tests the smallest cases first. For lists, size is just the number of elements in the list. An example of the shrinking function and less-than function for lists is provided below:

// List shrinking and less-than:
// Shrink
{[list] : {[x;y;z] x@where y?0b}[list;count list] each til 20; };
// Less Than
{[a;b] : (count a) < (count b); }

The new generator can then be used to test properties of the library:

.qch.summary
    .qch.check
    .qch.forall [treeGen] { .mytree.isBalanced .mytree.balance x }

Testing APIs and state with QuickCheck

QuickCheck itself handles testing ‘unit-level’ properties, i.e., properties on individual functions. This is great if the function is pure, but it is difficult to write a QuickCheck test for a stateful function. Further, the interaction between the entire API is often not captured by a set of properties. QuickCheck for APIs and state allows the entire system to be specified in a similar way to QuickCheck properties. The entire system specification is then given to QuickCheck which tries to find inconsistencies between the specification and the API implementation.

Description

QuickCheck models each API as a ‘command’ which contains a label, code to execute the command, a function to generate random arguments, and pre-/post-conditions for the command with the following signatures:

Command ::
    label : symbol
    execute : args -> any
    args : () -> [gen]
    precondition : () -> bool
    postconditions : [(laststate, currentstate, args, return) -> bool]

and then models the system with a specification of the following form:

Specification ::
    init : () -> ()
    currentState : () -> any
    commands : [Command]

Example

As a simple example, let's test a simple stateful key-value store. This key-value store will store the data in global state to simplify the example, but threading APIs can be tested in a similar way by storing the threaded state in a temporary global.

// Key-value store
.kv.init    : { .kv.STATE :: ()!() };
.kv.add     : { .kv.STATE[x] :: y };
.kv.remove  : { .kv.STATE :: enlist[x] _ .kv.STATE };
.kv.get     : { .kv.STATE[x] };

Commands

`add`

Now that we have an API, we should specify the behavior of the API. Each API (other than the initialization) is modelled with a Command. For example, here is the command modelling the "add" command:

cmd_add : .qch.state.cmd (
    // label
    `add;
    // execute
    .kv.add;
    // args
    {[st] (.qch.g.symbol[]; .qch.g.any[]) };
    // precondition
    {1b};
    // postconditions
    enlist {[cs;ls;a;r] .kv.STATE ~ ls , enlist[a 0]!enlist[a 1] }
    );

The command is given a label and the actual API call. The third argument is a function that takes the current state of the system and returns a list of QuickCheck generators that describe arguments to pass to the function. add takes a symbol and any value, so this function returns a list of a symbol generator and an any generator.

Next, add can be called with no conditions, so the preconditions always return true.

Last, a list of postcondition functions is provided. add has only one postcondition – the new state is the old state with the new key mapping to the new value. Note that each postcondition function is given the state before calling add, the arguments to add, and the return value of add. This allows the postconditions to test against the return value, and check for changes in the state as a result of the call.

`remove`

We can go ahead and add a specification for remove as well:

cmd_remove : .qch.state.cmd (
    // label
    `remove;
    // execute
    .kv.remove;
    // args
    {[st] enlist .qch.g.elements key .kv.STATE };
    // precondition
    { 0 < count key .kv.STATE };
    // postconditions
    (   {[cs;ls;a;r] not a[0] in key .kv.STATE };
        {[cs;ls;a;r] count[key ls] = 1 + count key .kv.STATE })
    );

There are some interesting things to note about this specification. First, remove relies on something being in the store before it can be called, this is specified in the precondition. Next, the arguments to remove rely on the current state. (A random key is chosen from the current store). Last, there are two postconditions rather than one. The first checks that the key now maps to nothing, and the seconds checks that the key has been removed from the store keys.

`get`

The specification for get follows in the same way:

cmd_get : .qch.state.cmd (
    // label
    `get;
    // execute
    .kv.get;
    // args
    {[st] enlist .qch.g.elements key .kv.STATE };
    // precondition
    { 0 < count key .kv.STATE };
    // postconditions
    (   {[cs;ls;a;r] r ~ .kv.STATE a 0 };
        {[cs;ls;a;r] ls ~ .kv.STATE })
    );

Specification

Finally, the commands are gathered in a specification of the system:

dict_spec : .qch.state.spec (
    // initialization
    { .kv.init[]; .kv.STATE };
    // state snapshot
    { .kv.STATE };
    // commands/api
    (cmd_add; cmd_remove; cmd_get)
    );

The first argument is a function that performs any stateful initialization. The next is a function that returns a snapshot of the current state of the system, whatever that may be. The last is a list of the commands we have already defined.

Run

We can then run this system specification by passing it to QuickCheck:

q).qch.state.summary .qch.state.check dict_spec
State simulation results:

        ::  =  add     @  `b, 6344385196431618702  <-  ()!()
        ::  =  remove  @  `b                       <-  (,`b)!,6344385196431618702
        ::  =  add     @  `doadbh, 1377664762i     <-  (`symbol$())!`long$()

Failed! Falsifiable (after 3 calls).
        Error: add postcondition 0 not satisfied

The output present in the IO of the simulation path. The path can be read as RETURNVALUE = CALL @ [ ARGS ] <- WITHSTATE. Looking at the result, we are told that the test failed because the first postcondition of add could not be satisfied. We can see that this only happens when add is called twice, with a different argument the second time. Looking at the state, after adding the first element, the underlying dictionary becomes a typed dictionary, and adding anything else will fail.

Fix

Let’s fix the issue:

    .kv.init    : { .kv.STATE :: ()!() };
    .kv.add     : { .kv.STATE[x] :: .kv.STATE , enlist[x]!enlist y };

// Ensure the result is not typed
    .kv.remove  : { .kv.STATE :: enlist[x] _ .kv.STATE };
    .kv.get     : { .kv.STATE[x] };

If we run the test again, we get:

State simulation results:
...
OK, passed simulation with 63 calls.

Looks like we fixed the bug.

Wrapping in QuickCheck for qCumber and repeated simulations

This runs only one simulation though, so to be sure, let’s wrap this in a QuickCheck test and decrease the probability of stopping:

dict_spec : .qch.state.spec (
    // initialization
    { .kv.init[]; .kv.STATE };
    // state snapshot
    { .kv.STATE };
    // commands/api
    (cmd_add; cmd_remove; cmd_get);
    0.005   // <- 0.5% chance of stopping (large simulations)
    );

.qch.state.quick dict_spec;

Result:

1b

Quickcheq passed 100 random simulations of the API.