Statistical Transformation Functions

.st provides statistical transformation implementations (primarily) for use with GG graphics. See .gg.stat for GG wrappers.

.gg.stat.ty.applyF

Parameter(s):

Name Type Description
<param> dict
<param>.applyF fn (table) → table
<param>.colmap symbol[]

Returns:

Name Type Description
fn (table) → table a transform function

.gg.stat.ty.with.colmap

Parameter(s):

Name Type Description
symbol[] a list of symbols that will be output identically to the input
<param> dict
<param>.applyF fn (table) → table
<param>.colmap symbol[]

Returns:

Name Type Description
<returns> dict
<returns>.applyF fn (table) → table
<returns>.colmap symbol[]

.gg.stat.ty.colmap

Parameter(s):

Name Type Description
<param> dict
<param>.applyF fn (table) → table
<param>.colmap symbol[]

Returns:

Name Type Description
symbol[] a list of symbols that will be output identically to the input

.gg.stat.ty.new

Parameter(s):

Name Type Description
(fn (table) → table; symbol[]) A tuple of:
    a transform function
    a list of symbols that will be output identically to the input

.gg.stat.bin1d

A 1d binning stat

Parameter(s):

Name Type Description
column symbol
binspec (symbol; number; number) width or count (w orc), argument, padding
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i

Returns:

Name Type Description
<returns> dict table -> transformed table

See Also: .st.bin1d

.gg.stat.bin2d

2d binning transform

Parameter(s):

Name Type Description
columns symbol[] list of two column names
binspec1 (symbol; number; number) width or count (w; orc), argument, padding
binspec2 (symbol; number; number)
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i

Returns:

Name Type Description
<returns> dict table -> transformed table

See Also: .st.bin2d

.gg.stat.binNd

nD binning transform

Parameter(s):

Name Type Description
columns symbol[] list of n column names
binspecs (symbol; number; number)[] list of n triples of: width or count (w orc), argument, padding
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i

Returns:

Name Type Description
<returns> dict table -> transformed table

See Also: .st.binNd

.gg.stat.lsquares

Least-squares regression transform function.

Produce a table of values along the least-squares regression of the input table

Parameter(s):

Name Type Description
x symbol column
y symbol column
degree number degree of the least-squares fit polynomial

Returns:

Name Type Description
<returns> dict transform object

See Also: .st.lsqTable

.gg.stat.outliers

Compute the outliers component of a box-plot

Parameter(s):

Name Type Description
catcol symbol categorical column name
numcol symbol continuous column name

Returns:

Name Type Description
fn table -> transformed table

See Also: .st.outliers

.gg.stat.pie

Summarizing 1d bin transform with an additional constant 0 column (const__)

Parameter(s):

Name Type Description
column symbol column name
aggs dict aggregators to use (see .st.a)

Returns:

Name Type Description
<returns> dict new stat transform

.gg.stat.quantile

Compute the quantiles of a numeric column

Parameter(s):

Name Type Description
column symbol column name

Returns:

Name Type Description
<returns> dict table -> quantile table

See Also: .st.quantile

.gg.stat.quartiles

Compute the quartiles of a column for each distinct value of another column

Parameter(s):

Name Type Description
catcol symbol categorical column name
numcol symbol continuous column name

Returns:

Name Type Description
<returns> dict table -> transformed table

See Also: .st.quartiles

.gg.stat.sbin1d

Scaled 1d bin (i.e., log bins)

Parameter(s):

Name Type Description
column symbol column name
binspec (symbol; number; number) width or count (w orc), argument, padding
scale dict see .gg.scale
aggs dict see .st.a.*

Returns:

Name Type Description
<returns> dict table -> transformed table

See Also: .st.sbin1d

.gg.stat.sbin2d

Scaled 2d bin (i.e., log bins)

Parameter(s):

Name Type Description
columns symbol[] 2 column names
binspec1 (symbol; number; number) width or count (w orc), argument, padding
binspec2 (symbol; number; number)
scale1 dict see .gg.scale
scale2 dict
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i

Returns:

Name Type Description
<returns> dict table -> transformed table

See Also: .st.sbin2d

.gg.stat.sbinNd

Scaled nD bin (i.e., log bins)

Parameter(s):

Name Type Description
columns symbol n column names
binspecs (symbol; number; number) n triples of: width or count (w orc), argument, padding
scales dict n scales -- see .gg.scale
aggs dict see .st.a.*
options dict | null null for defaults, see .st.sbinNd_i

Returns:

Name Type Description
<returns> dict table -> transformed table

See Also: .st.sbinNd

.gg.stat.summary

Compute 5-number summaries of a column for each distinct value of another column

Parameter(s):

Name Type Description
catcol symbol categorical column name
numcol symbol continuous column name

Returns:

Name Type Description
<returns> dict table -> transformed table

See Also: .st.summary

.st.a.avg

Return a description of an avg aggregation.

Note - the aggregation will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).

Parameter(s):

Name Type Description
col symbol column name to avg

See Also: .st.a.custom

Example: An average aggregation of a column

 .st.a.avg[`mycolumn]

Example: A count and avg aggregation

 .st.a.count[] , .st.a.avg[`mycolumn]

.st.a.count

Return a count aggregation description. The output will be mapped to a variable named count__.

Returns:

Name Type Description
<returns> dict

Example: A count aggregation

 .st.a.count[]

.st.a.custom

Return a description for a custom aggregation on a table. The custom function should take a list of the type of the column, and return a single value (e.g. avg, dev, {count distinct x}, etc)

Parameter(s):

Name Type Description
n symbol name of resulting column
col symbol name of column to aggregate
customF fn function to aggregate sublists of the column

Returns:

Name Type Description
<returns> dict

Example: Custom average aggregator

 .st.a.custom[`outputName__; `mycolumn; avg]

Example: Count and a custom aggregator count occurrences

 .st.a.count[] , .st.a.custom[`output__; `mycolumn; {count where x = `something}]

.st.a.max

Return a description of a max aggregation.

Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).

Parameter(s):

Name Type Description
col symbol column name to max

See Also: .st.a.custom

.st.a.min

Return a description of a min aggregation

Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).

Parameter(s):

Name Type Description
col symbol column name to min

See Also: .st.a.custom

.st.a.sum

Return a description of a sum aggregation

Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).

Parameter(s):

Name Type Description
col symbol column name to sum

See Also: .st.a.custom

.st.bin1d

Perform a 1D binning on a table, performing all specified aggregations.

Parameter(s):

Name Type Description
col symbol column to bin
val (symbol; number) width or count (`w or `c) and argument pair
aggs dict dictionary of aggregation descriptions (see .st.a.*)
options dict | null see .st.bin2d
table table

Returns:

Name Type Description
<returns> table binned and aggregated table

Throws:

Type Description
"column x not found"

Example: Basic categorical bin with count aggregation

      t: ([]x:45?5?`8; v:45?45);

      .st.bin1d[`x; ::; .st.a.count[]; ::; t]

 /=> x        x_start__ x_end__  count__
 /=> -----------------------------------
 /=> akkihkkm akkihkkm  dkphkccc 5      
 /=> dkphkccc dkphkccc  fchdbpfd 11     
 /=> fchdbpfd fchdbpfd  mbpcngkg 11     
 /=> mbpcngkg mbpcngkg  pdhioofe 8      
 /=> pdhioofe pdhioofe           10    

Example: Categorical bin with count and avg aggregations

      .st.bin1d[`x; ::; .st.a.count[] , .st.a.custom[`myoutput__;`v;avg]; ::; t]

 /=> x        x_start__ x_end__  count__ myoutput__
 /=> ----------------------------------------------
 /=> biamifgg biamifgg  ekeilfak 12      17.25     
 /=> ekeilfak ekeilfak  obikddhi 4       22        
 /=> obikddhi obikddhi  oebfende 11      28.63636  
 /=> oebfende oebfende  pbaioapc 7       14.42857  
 /=> pbaioapc pbaioapc           11      21        

Example: Numeric bin with count aggregation

      .st.bin1d[`v; ::; .st.a.count[]; ::; t]

 /=> v  v_start__ v_end__ count__
 /=> ----------------------------
 /=> 0  0         3       5      
 /=> 3  3         6       3      
 /=> 6  6         9       4      
 /=> 9  9         12      2      
 /=> 12 12        15      1      
 /=> 15 15        18      5      
 /=> ...

Example: Custom numeric bin with count aggregation and centered output

      .st.bin1d[`v; (`w;10;0); .st.a.count[]; enlist[`center]!enlist 1b; t]
                 // ^ 10-unit wide bins
                                              // ^ center the output point    

 /=> v  v_start__ v_end__ count__
 /=> ----------------------------
 /=> 5  0         10      13     
 /=> 15 10        20      9      
 /=> 25 20        30      8      
 /=> 35 30        40      10     
 /=> 45 40        50      5      

.st.bin2d

Perform a 2D binning and all specified aggregations on the bins of a specified table.

Parameter(s):

Name Type Description
columns symbol[] pair of column names
xbins (symbol; number) width or count (`w or `c) and argument
ybins (symbol; number) width or count (`w or `c) and argument
mods dict aggregations to perform
options dict (norm: symbol; center: boolean; hex: boolean) | null options for binning
table table

Returns:

Name Type Description
<returns> table

Throws:

Type Description
"column x not found"

See Also: .st.bin1d

Example: Basic 2d binning using all defaults

     .st.bin2d[`x`y; ::; ::; .st.a.count[]; ::; ([]x:til 45; y: til 45)]

Example: Bin with 40 x bins

     .st.bin2d[`x`y; (`c;40;0); ::; .st.a.count[]; ::; ([]x:til 45; y: til 45)]

Example: Bin with centered bins

     .st.bin2d[`x`y; ::; ::; .st.a.count[]; enlist[`center]!enlist 1b; ([]x:til 45; y: til 45)]

Example: Bin with x normalized by y

     .st.bin2d[`x`y; ::; ::; .st.a.count[]; enlist[`norm]!enlist `x; ([]x:til 45; y: 45?`a`b`c)]

Example: Bin with custom sum aggregation on x

     .st.bin2d[`x`y; ::; ::; .st.a.count[] , .st.a.custom[`newx; `x; sum]; ::; ([]x:til 45; y: 45?`a`b`c)]

.st.binNd

Perform an nD binning and all specified aggregations on the bins of a specified table.

Parameter(s):

Name Type Description
columns symbol[] list of column names to bin
xbins (symbol; number; number) width or count (`w or `c) and argument
ybins (symbol; number; number) width or count (`w or `c) and argument
mods dict aggregations to perform
options dict | null see .st.bin2d
table table

Returns:

Name Type Description
<returns> table

Throws:

Type Description
"column x not found"

See Also: .st.bin1d

.st.factorial

Calculates the factorial of a number This uses floats, as longs overflow too quickly

Parameter(s):

Name Type Description
x Number

Returns:

Name Type Description
float

.st.gen.normal

Generate a normal distribution

Parameter(s):

Name Type Description
n Long The number of points to generate

Returns:

Name Type Description
Float[] The random data points

.st.lsqTable

Return a 1000-point sampling of the d-degree least squares fit of the x and y column of the given table

Parameter(s):

Name Type Description
x symbol column name
y symbol column name
d long degree (i.e., between 0-4)
t table | dict table or .gg.tbl.ty instance

Returns:

Name Type Description
<returns> table 1000-point sampling

.st.lsquares

Return the coefficients of the d-degree least-squares fit on the given table

Parameter(s):

Name Type Description
x symbol column
y symbol column
d long degree (i.e., between 0-4)
table table | dict table or .gg.tbl.ty instance

Returns:

Name Type Description
number[] coefficients

.st.normalPDF

The probability density function of a normal distribution

Parameter(s):

Name Type Description
u Number The mean value
v Number The variance
x Number The independent variable

Returns:

Name Type Description
Number

.st.outliers

Return the "outliers" component of a box-plot. All data points further than 1.5 times the interquartile range from the median are returned.

Parameter(s):

Name Type Description
catcol symbol categorical column
numcol symbol numeric column
table table

Returns:

Name Type Description
<returns> table

See Also: .st.summary

Example:

      t : ([]x:45?5?`8; y:45?45);

      .st.outliers[`x; `y; t]

 /=> x        y 
 /=> -----------
 /=> mijpkecf 44
 /=> kiggemin 39

.st.poissonPMF

The probability mass function of a poisson distribution

Parameter(s):

Name Type Description
l Number The mean value
k Number The number of occurrences

Returns:

Name Type Description
Float The probability of a given outcome

.st.quantile

Perform a quantile transform on a numeric column

Parameter(s):

Name Type Description
x symbol numeric column name
table table

Returns:

Name Type Description
<returns> table

Throws:

Type Description
"column x not found"
"column x of type y not one of z"

Example:

      t:([]x:45?45);

      .st.quantile[`x; t]

 /=> x  fvalue__  
 /=> -------------
 /=> 0  0.01111111
 /=> 0  0.03333333
 /=> 1  0.05555556
 /=> 1  0.07777778
 /=> 1  0.1       
 /=> 2  0.1222222 
 /=> 2  0.1444444 
 /=> ...

.st.quartiles

Take the quartiles of column y of the table for each distinct column x value.

The output columns are the following:

  • the first column has the same name as the given categorical column (x)
  • q1__ - first quartile
  • q2__ - second quartile
  • q3__ - third quartile

Parameter(s):

Name Type Description
x symbol
y symbol
table table

Returns:

Name Type Description
<returns> table

Throws:

Type Description
"column x not found"
"column x of type y not one of z"

See Also: .st.summary

Example:

      t: ([]x:45?5?`8; y:45?45);

      .st.quartiles[`x; `y; t]

 /=> x        q1__ q2__ q3__
 /=> -----------------------
 /=> npccjbfg 4    23   32.5
 /=> kcjfooab 7.5  23.5 31  
 /=> jnmhejla 14   23   30  
 /=> iiphmkna 16   25.5 31  
 /=> gnlighkg 22   34   41  

.st.rollup.col

Calculate a columns statistics if column is of type 1,4-10,12-19. Otherwise this will return empty stats dictionary

Parameter(s):

Name Type Description
col list The column in list format, which stats are to be calculated on.

Returns:

Name Type Description
<returns> dict The stats keyed by statistical operation.

.st.rollup.table

For a given table, construct the statistics on each column

Parameter(s):

Name Type Description
table table The (keyed) table which stats should be calculated.

Returns:

Name Type Description
<returns> dict A dictionary keyed by column names containing each columns calculated stats.

.st.sbin1d

Perform a scaled 1d bin on a table. The given scale is applied to the data before binning.

Parameter(s):

Name Type Description
col symbol
val (symbol; number) width or count (`w or `c) and arg for bin
scale dict scale (see .gg.scale)
aggs dict aggregations (see .st.a.*)
options dict | null see .st.bin2d
table table

Returns:

Name Type Description
<returns> table

See Also: .st.bin1d

.st.sbin2d

Perform a 2d binning and necessary aggregations.

Parameter(s):

Name Type Description
columns symbol[] pair of symbols to bin
xbins (symbol; number; number) (`w or `c; arg; padding)
ybins (symbol; number; number) (`w or `c; arg; padding)
xscale dict scale for the x axis
yscale dict scale for the y axis
mods dict aggregations
options dict | null see .st.bin2d
table table

See Also: .st.bin1d

.st.sbinNd

Perform an nD binning using the given scales. Bins can be specified as in width or height. The padding to a bin is added to increase the range over which the bins are split. For example, categorical columns would likely specify a padding value of 1 so that bins (`c;5;1) are spaced on even numbers, rather than over (num distinct) % 5 intervals.

Parameter(s):

Name Type Description
columns symbol[] list of column names
binDescr (symbol; number; number)[] bin description (`w or `c; arg; padding)
scales dict[] list of scales for each variable
mods dict list of aggregations to perform
options dict | null see .st.bin2d
table table

See Also: .st.bin1d

.st.sbinNd_i

Perform an nD binning using the given scales. Bins can be specified as in width or height. The padding to a bin is added to increase the range over which the bins are split. For example, categorical columns would likely specify a padding value of 1 so that bins (`c;5;1) are spaced on even numbers, rather than over (num distinct) % 5 intervals.

Parameter(s):

Name Type Description
columns symbol[] list of column names
binDescrs ((symbol; number; number) | null)[] bin description (`w or `c; arg; padding), or null to accept defaults
scales dict[] list of scales for each variable
mods dict list of aggregations to perform
options dict | null see .st.bin2d
table table

Returns:

Name Type Description
<returns> table binned data

Throws:

Type Description
"column x not found" [!!] NOTE - this binning function should avoid duplicating data at all costs!

See Also: .st.bin1d

.st.summary

Returns a five-number-summary-style report for each subset of the data split on distinct values of one column (x). The x column should be categorical, while the y column should be numeric/continuous.

The output columns are the following:

  • the first column has the same name as the given categorical column (x)
  • q1__ - first quartile
  • q2__ - second quartile
  • q3__ - third quartile
  • min__ - min
  • max__ - max
  • med__ - median
  • mean__ - average (avg)
  • upper__ - upper hinge (1.5 * interquartile range from median)
  • lower__ - lower hinge (1.5 * interquartile range from median)

Parameter(s):

Name Type Description
x symbol categorical column name
y symbol continuous column name
table table

Returns:

Name Type Description
<returns> table

Throws:

Type Description
"column x not found"
"column x of type y not one of z"

See Also: .st.outliers .st.quartiles

Example:

      t:([]x:45?5?`8; y:45?45);

      .st.summary[`x; `y; t]

 /=> x        q1__ q2__ q3__ min__ max__ med__ mean__   upper__ lower__
 /=> ------------------------------------------------------------------
 /=> npccjbfg 4    23   32.5 0     42    23    18.88889 42      0      
 /=> kcjfooab 7.5  23.5 31   1     42    23    20       42      1      
 /=> jnmhejla 14   23   30   7     33    22.5  21       33      7      
 /=> iiphmkna 16   25.5 31   13    41    20    24.2     41      13     
 /=> gnlighkg 22   34   41   17    42    30    29.66667 42      17