Skip to content

Statistical Transformation Functions

.st provides statistical transformation implementations (primarily) for use with GG graphics. See .gg.stat for GG wrappers.



Name Type Description
<param> dict
<param>.applyF fn (table) → table
<param>.colmap symbol[]


Type Description
fn (table) → table a transform function



Name Type Description
symbol[] a list of symbols that will be output identically to the input
<param> dict
<param>.applyF fn (table) → table
<param>.colmap symbol[]


Name Type Description
<returns> dict
<returns>.applyF fn (table) → table
<returns>.colmap symbol[]



Name Type Description
<param> dict
<param>.applyF fn (table) → table
<param>.colmap symbol[]


Type Description
symbol[] a list of symbols that will be output identically to the input


Name Type Description
(fn (table) → table; symbol[]) A tuple of:
    a transform function
    a list of symbols that will be output identically to the input


A 1d binning stat


Name Type Description
column symbol
binspec (symbol; number; number) width or count (w orc), argument, padding
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i


Type Description
dict table -> transformed table

See Also: .st.bin1d


2d binning transform


Name Type Description
columns symbol[] list of two column names
binspec1 (symbol; number; number) width or count (w; orc), argument, padding
binspec2 (symbol; number; number)
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i


Type Description
dict table -> transformed table

See Also: .st.bin2d


nD binning transform


Name Type Description
columns symbol[] list of n column names
binspecs (symbol; number; number)[] list of n triples of: width or count (w orc), argument, padding
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i


Type Description
dict table -> transformed table

See Also: .st.binNd


Least-squares regression transform function.

Produce a table of values along the least-squares regression of the input table


Name Type Description
x symbol column
y symbol column
degree number degree of the least-squares fit polynomial


Type Description
dict transform object

See Also: .st.lsqTable


Moving average statistic


Name Type Description
num long the number of values to be averaged at each point
x symbol column
y symbol column
g symbol | null group column


Type Description
dict transform object


Compute the outliers component of a box-plot


Name Type Description
catcol symbol categorical column name
numcol symbol continuous column name


Type Description
fn table -> transformed table

See Also: .st.outliers


Summarizing 1d bin transform with an additional constant 0 column (const__)


Name Type Description
column symbol column name
aggs dict aggregators to use (see .st.a)


Type Description
dict new stat transform


Compute the quantiles of a numeric column


Name Type Description
column symbol column name


Type Description
dict table -> quantile table

See Also: .st.quantile


Compute the quartiles of a column for each distinct value of another column


Name Type Description
catcol symbol categorical column name
numcol symbol continuous column name


Type Description
dict table -> transformed table

See Also: .st.quartiles


Scaled 1d bin (i.e., log bins)


Name Type Description
column symbol column name
binspec (symbol; number; number) width or count (w orc), argument, padding
sc dict see .gg.scale
aggs dict see .st.a.*


Type Description
dict table -> transformed table

See Also: .st.sbin1d


Scaled 2d bin (i.e., log bins)


Name Type Description
columns symbol[] 2 column names
binspec1 (symbol; number; number) width or count (w orc), argument, padding
binspec2 (symbol; number; number)
scale1 dict see .gg.scale
scale2 dict
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i


Type Description
dict table -> transformed table

See Also: .st.sbin2d


Scaled nD bin (i.e., log bins)


Name Type Description
columns symbol n column names
binspecs (symbol; number; number) n triples of: width or count (w orc), argument, padding
scales dict n scales -- see .gg.scale
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i


Type Description
dict table -> transformed table

See Also: .st.sbinNd


Compute 5-number summaries of a column for each distinct value of another column


Name Type Description
catcol symbol categorical column name
numcol symbol continuous column name


Type Description
dict table -> transformed table

See Also: .st.summary


Return a description of an avg aggregation.

Note - the aggregation will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).


Name Type Description
col symbol column name to avg

See Also: .st.a.custom

Example: An average aggregation of a column


Example: A count and avg aggregation

 .st.a.count[] , .st.a.avg[`mycolumn]


Return a count aggregation description. The output will be mapped to a variable named count__.


Type Description

Example: A count aggregation



Return a description for a custom aggregation on a table. The custom function should take a list of the type of the column, and return a single value (e.g. avg, dev, {count distinct x}, etc)


Name Type Description
n symbol name of resulting column
col symbol name of column to aggregate
customF fn function to aggregate sublists of the column


Type Description

Example: Custom average aggregator

 .st.a.custom[`outputName__; `mycolumn; avg]

Example: Count and a custom aggregator count occurrences

 .st.a.count[] , .st.a.custom[`output__; `mycolumn; {count where x = `something}]


Return a description of a max aggregation.

Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).


Name Type Description
col symbol column name to max

See Also: .st.a.custom


Return a description of a min aggregation

Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).


Name Type Description
col symbol column name to min

See Also: .st.a.custom


Return a description of a sum aggregation

Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).


Name Type Description
col symbol column name to sum

See Also: .st.a.custom


Perform a 1D binning on a table, performing all specified aggregations.


Name Type Description
col symbol column to bin
val (symbol; number) width or count (`w or `c) and argument pair
aggs dict dictionary of aggregation descriptions (see .st.a.*)
options dict | null see .st.bin2d
table table


Type Description
table binned and aggregated table


Type Description
"column x not found"

Example: Basic categorical bin with count aggregation

      t: ([]x:45?5?`8; v:45?45);

      .st.bin1d[`x; ::; .st.a.count[]; ::; t]

 /=> x        x_start__ x_end__  count__
 /=> -----------------------------------
 /=> akkihkkm akkihkkm  dkphkccc 5      
 /=> dkphkccc dkphkccc  fchdbpfd 11     
 /=> fchdbpfd fchdbpfd  mbpcngkg 11     
 /=> mbpcngkg mbpcngkg  pdhioofe 8      
 /=> pdhioofe pdhioofe           10    

Example: Categorical bin with count and avg aggregations

      .st.bin1d[`x; ::; .st.a.count[] , .st.a.custom[`myoutput__;`v;avg]; ::; t]

 /=> x        x_start__ x_end__  count__ myoutput__
 /=> ----------------------------------------------
 /=> biamifgg biamifgg  ekeilfak 12      17.25     
 /=> ekeilfak ekeilfak  obikddhi 4       22        
 /=> obikddhi obikddhi  oebfende 11      28.63636  
 /=> oebfende oebfende  pbaioapc 7       14.42857  
 /=> pbaioapc pbaioapc           11      21        

Example: Numeric bin with count aggregation

      .st.bin1d[`v; ::; .st.a.count[]; ::; t]

 /=> v  v_start__ v_end__ count__
 /=> ----------------------------
 /=> 0  0         3       5      
 /=> 3  3         6       3      
 /=> 6  6         9       4      
 /=> 9  9         12      2      
 /=> 12 12        15      1      
 /=> 15 15        18      5      
 /=> ...

Example: Custom numeric bin with count aggregation and centered output

      .st.bin1d[`v; (`w;10;0); .st.a.count[]; enlist[`center]!enlist 1b; t]
                 // ^ 10-unit wide bins
                                              // ^ center the output point    

 /=> v  v_start__ v_end__ count__
 /=> ----------------------------
 /=> 5  0         10      13     
 /=> 15 10        20      9      
 /=> 25 20        30      8      s
 /=> 35 30        40      10     
 /=> 45 40        50      5      


Perform a 2D binning and all specified aggregations on the bins of a specified table.

If hexbins are requested, normalization and centering are disabled.


Name Type Description
columns symbol[] pair of column names
xbins (symbol; number) width or count (`w or `c) and argument
ybins (symbol; number) width or count (`w or `c) and argument
mods dict aggregations to perform
options dict (norm: symbol; center: boolean; hex: boolean) | null options for binning
table table


Type Description


Type Description
"column x not found"

See Also: .st.bin1d

Example: Basic 2d binning using all defaults

     .st.bin2d[`x`y; ::; ::; .st.a.count[]; ::; ([]x:til 45; y: til 45)]

Example: Bin with 40 x bins

     .st.bin2d[`x`y; (`c;40;0); ::; .st.a.count[]; ::; ([]x:til 45; y: til 45)]

Example: Bin with centered bins

     .st.bin2d[`x`y; ::; ::; .st.a.count[]; enlist[`center]!enlist 1b; ([]x:til 45; y: til 45)]

Example: Bin with x normalized by y

     .st.bin2d[`x`y; ::; ::; .st.a.count[]; enlist[`norm]!enlist `x; ([]x:til 45; y: 45?`a`b`c)]

Example: Bin with custom sum aggregation on x

     .st.bin2d[`x`y; ::; ::; .st.a.count[] , .st.a.custom[`newx; `x; sum]; ::; ([]x:til 45; y: 45?`a`b`c)]

Example: Bin using a by-clause and a bin

     .st.bin2d[`x`y; ::; `by; .st.a.count[] , .st.a.custom[`newx; `x; sum]; ::; ([]x:til 45; y: 45?`a`b`c)]


Perform an nD binning and all specified aggregations on the bins of a specified table.


Name Type Description
columns symbol[] list of column names to bin
xbins (symbol; number; number) width or count (`w or `c) and argument
ybins (symbol; number; number) width or count (`w or `c) and argument
mods dict aggregations to perform
options dict | null see .st.bin2d
table table


Type Description


Type Description
"column x not found"

See Also: .st.bin1d


Calculates the factorial of a number This uses floats, as longs overflow too quickly


Name Type Description
x Number


Type Description


Generate a normal distribution


Name Type Description
n Long The number of points to generate


Type Description
Float[] The random data points


Return a 1000-point sampling of the d-degree least squares fit of the x and y column of the given table


Name Type Description
x symbol column name
y symbol column name
d long degree (i.e., between 0-4)
t table | dict table or .gg.tbl.ty instance


Type Description
table 1000-point sampling


Return a 1000-point sampling of the d-degree least squares fit of the x and y column of the given table


Name Type Description
x symbol column name
y symbol column name
g symbol group column name
d long degree (i.e., between 0-4)
t table | dict table or .gg.tbl.ty instance


Type Description
table 1000-point sampling


Return the coefficients of the d-degree least-squares fit on the given table


Name Type Description
x symbol column
y symbol column
d long degree (i.e., between 0-4)
table table | dict table or .gg.tbl.ty instance


Type Description
number[] coefficients


The probability density function of a normal distribution


Name Type Description
u Number The mean value
v Number The variance
x Number The independent variable


Type Description


Return the "outliers" component of a box-plot. All data points further than 1.5 times the interquartile range from the median are returned.


Name Type Description
catcol symbol categorical column
numcol symbol numeric column
table table


Type Description

See Also: .st.summary


      t : ([]x:45?5?`8; y:45?45);

      .st.outliers[`x; `y; t]

 /=> x        y 
 /=> -----------
 /=> mijpkecf 44
 /=> kiggemin 39


The probability mass function of a poisson distribution


Name Type Description
l Number The mean value
k Number The number of occurrences


Type Description
Float The probability of a given outcome


Perform a quantile transform on a numeric column


Name Type Description
x symbol numeric column name
table table


Type Description


Type Description
"column x not found"
"column x of type y not one of z"



      .st.quantile[`x; t]

 /=> x  fvalue__  
 /=> -------------
 /=> 0  0.01111111
 /=> 0  0.03333333
 /=> 1  0.05555556
 /=> 1  0.07777778
 /=> 1  0.1       
 /=> 2  0.1222222 
 /=> 2  0.1444444 
 /=> ...


Take the quartiles of column y of the table for each distinct column x value.

The output columns are the following:

  • the first column has the same name as the given categorical column (x)
  • q1__ - first quartile
  • q2__ - second quartile
  • q3__ - third quartile


Name Type Description
x symbol
y symbol
table table


Type Description


Type Description
"column x not found"
"column x of type y not one of z"

See Also: .st.summary


      t: ([]x:45?5?`8; y:45?45);

      .st.quartiles[`x; `y; t]

 /=> x        q1__ q2__ q3__
 /=> -----------------------
 /=> npccjbfg 4    23   32.5
 /=> kcjfooab 7.5  23.5 31  
 /=> jnmhejla 14   23   30  
 /=> iiphmkna 16   25.5 31  
 /=> gnlighkg 22   34   41  


Calculate a columns statistics if column is of type 1,4-10,12-19. Otherwise this will return empty stats dictionary


Name Type Description
col list The column in list format, which stats are to be calculated on.


Type Description
dict The stats keyed by statistical operation.


For a given table, construct the statistics on each column


Name Type Description
table table The (keyed) table which stats should be calculated.


Type Description
dict A dictionary keyed by column names containing each columns calculated stats.


Perform a scaled 1d bin on a table. The given scale is applied to the data before binning.


Name Type Description
col symbol
val (symbol; number) width or count (`w or `c) and arg for bin
scale dict scale (see .gg.scale)
aggs dict aggregations (see .st.a.*)
options dict | null see .st.bin2d
table table


Type Description

See Also: .st.bin1d


Perform a 2d binning and necessary aggregations.


Name Type Description
columns symbol[] pair of symbols to bin
xbins (symbol; number; number) (`w or `c; arg; padding)
ybins (symbol; number; number) (`w or `c; arg; padding)
xscale dict scale for the x axis
yscale dict scale for the y axis
mods dict aggregations
options dict | null see .st.bin2d
table table

See Also: .st.bin1d


Perform an nD binning using the given scales. Bins can be specified as in width or height. The padding to a bin is added to increase the range over which the bins are split. For example, categorical columns would likely specify a padding value of 1 so that bins (`c;5;1) are spaced on even numbers, rather than over (num distinct) % 5 intervals.


Name Type Description
columns symbol[] list of column names
binDescr (symbol; number; number)[] bin description (`w or `c; arg; padding)
scales dict[] list of scales for each variable
mods dict list of aggregations to perform
options dict | null see .st.bin2d
table table

See Also: .st.bin1d


Perform an nD binning using the given scales. Bins can be specified as in width or height. The padding to a bin is added to increase the range over which the bins are split. For example, categorical columns would likely specify a padding value of 1 so that bins (`c;5;1) are spaced on even numbers, rather than over (num distinct) % 5 intervals.


Name Type Description
cs symbol[] list of column names
descr ((symbol; number; number) | null)[] bin description (`w or `c; arg; padding), or null to accept defaults
scales dict[] list of scales for each variable
mods dict list of aggregations to perform
opts dict | null see .st.bin2d
t table


Type Description
table binned data


Type Description
"column x not found" [!!] NOTE - this binning function should avoid duplicating data at all costs!

See Also: .st.bin1d


Returns a five-number-summary-style report for each subset of the data split on distinct values of one column (x). The x column should be categorical, while the y column should be numeric/continuous.

The output columns are the following:

  • the first column has the same name as the given categorical column (x)
  • q1__ - first quartile
  • q2__ - second quartile
  • q3__ - third quartile
  • min__ - min
  • max__ - max
  • med__ - median
  • mean__ - average (avg)
  • upper__ - upper hinge (1.5 * interquartile range from median)
  • lower__ - lower hinge (1.5 * interquartile range from median)


Name Type Description
x symbol categorical column name
y symbol continuous column name
table table


Type Description


Type Description
"column x not found"
"column x of type y not one of z"

See Also: .st.outliers .st.quartiles


      t:([]x:45?5?`8; y:45?45);

      .st.summary[`x; `y; t]

 /=> x        q1__ q2__ q3__ min__ max__ med__ mean__   upper__ lower__
 /=> ------------------------------------------------------------------
 /=> npccjbfg 4    23   32.5 0     42    23    18.88889 42      0      
 /=> kcjfooab 7.5  23.5 31   1     42    23    20       42      1      
 /=> jnmhejla 14   23   30   7     33    22.5  21       33      7      
 /=> iiphmkna 16   25.5 31   13    41    20    24.2     41      13     
 /=> gnlighkg 22   34   41   17    42    30    29.66667 42      17