Statistical Transformation Functions
.st
provides statistical transformation implementations (primarily) for use with GG graphics. See .gg.stat
for GG wrappers.
.gg.stat.ty.applyF
Parameters:
Name | Type | Description |
---|---|---|
<param> | dict | |
<param>.applyF | fn (table) → table | |
<param>.colmap | symbol[] |
Returns:
Type | Description |
---|---|
fn (table) → table | a transform function |
.gg.stat.ty.with.colmap
Parameters:
Name | Type | Description |
---|---|---|
symbol[] | a list of symbols that will be output identically to the input | |
<param> | dict | |
<param>.applyF | fn (table) → table | |
<param>.colmap | symbol[] |
Returns:
Name | Type | Description |
---|---|---|
<returns> | dict | |
<returns>.applyF | fn (table) → table | |
<returns>.colmap | symbol[] |
.gg.stat.ty.colmap
Parameters:
Name | Type | Description |
---|---|---|
<param> | dict | |
<param>.applyF | fn (table) → table | |
<param>.colmap | symbol[] |
Returns:
Type | Description |
---|---|
symbol[] | a list of symbols that will be output identically to the input |
.gg.stat.ty.new
Parameter:
Name | Type | Description |
---|---|---|
(fn (table) → table; symbol[]) | A tuple of: a transform function a list of symbols that will be output identically to the input |
.gg.stat.bin1d
A 1d binning stat
Parameters:
Name | Type | Description |
---|---|---|
column | symbol | |
binspec | (symbol; number; number) | width or count (w or c), argument, padding |
aggs | dict | see .st.a.* |
options | dict | null | null for defaults, see .st.sbinNd_i |
Returns:
Type | Description |
---|---|
dict | table -> transformed table |
See Also: .st.bin1d
.gg.stat.bin2d
2d binning transform
Parameters:
Name | Type | Description |
---|---|---|
columns | symbol[] | list of two column names |
binspec1 | (symbol; number; number) | width or count (w; or c), argument, padding |
binspec2 | (symbol; number; number) | |
aggs | dict | see .st.a.* |
options | dict | null | null for defaults, see .st.sbinNd_i |
Returns:
Type | Description |
---|---|
dict | table -> transformed table |
See Also: .st.bin2d
.gg.stat.binNd
nD binning transform
Parameters:
Name | Type | Description |
---|---|---|
columns | symbol[] | list of n column names |
binspecs | (symbol; number; number)[] | list of n triples of: width or count (w or c), argument, padding |
aggs | dict | see .st.a.* |
options | dict | null | null for defaults, see .st.sbinNd_i |
Returns:
Type | Description |
---|---|
dict | table -> transformed table |
See Also: .st.binNd
.gg.stat.lsquares
Least-squares regression transform function.
Produce a table of values along the least-squares regression of the input table
Parameters:
Name | Type | Description |
---|---|---|
x | symbol | column |
y | symbol | column |
degree | number | degree of the least-squares fit polynomial |
Returns:
Type | Description |
---|---|
dict | transform object |
See Also: .st.lsqTable
.gg.stat.mavg
Moving average statistic
Parameters:
Name | Type | Description |
---|---|---|
num | long | the number of values to be averaged at each point |
x | symbol | column |
y | symbol | column |
g | symbol | null | group column |
Returns:
Type | Description |
---|---|
dict | transform object |
.gg.stat.outliers
Compute the outliers component of a box-plot
Parameters:
Name | Type | Description |
---|---|---|
catcol | symbol | categorical column name |
numcol | symbol | continuous column name |
Returns:
Type | Description |
---|---|
fn | table -> transformed table |
See Also: .st.outliers
.gg.stat.pie
Summarizing 1d bin transform with an additional constant 0 column (const__
)
Parameters:
Name | Type | Description |
---|---|---|
column | symbol | column name |
aggs | dict | aggregators to use (see .st.a) |
Returns:
Type | Description |
---|---|
dict | new stat transform |
.gg.stat.quantile
Compute the quantiles of a numeric column
Parameter:
Name | Type | Description |
---|---|---|
column | symbol | column name |
Returns:
Type | Description |
---|---|
dict | table -> quantile table |
See Also: .st.quantile
.gg.stat.quartiles
Compute the quartiles of a column for each distinct value of another column
Parameters:
Name | Type | Description |
---|---|---|
catcol | symbol | categorical column name |
numcol | symbol | continuous column name |
Returns:
Type | Description |
---|---|
dict | table -> transformed table |
See Also: .st.quartiles
.gg.stat.sbin1d
Scaled 1d bin (i.e., log bins)
Parameters:
Name | Type | Description |
---|---|---|
column | symbol | column name |
binspec | (symbol; number; number) | width or count (w or c), argument, padding |
sc | dict | see .gg.scale |
aggs | dict | see .st.a.* |
Returns:
Type | Description |
---|---|
dict | table -> transformed table |
See Also: .st.sbin1d
.gg.stat.sbin2d
Scaled 2d bin (i.e., log bins)
Parameters:
Name | Type | Description |
---|---|---|
columns | symbol[] | 2 column names |
binspec1 | (symbol; number; number) | width or count (w or c), argument, padding |
binspec2 | (symbol; number; number) | |
scale1 | dict | see .gg.scale |
scale2 | dict | |
aggs | dict | see .st.a.* |
options | dict | null | null for defaults, see .st.sbinNd_i |
Returns:
Type | Description |
---|---|
dict | table -> transformed table |
See Also: .st.sbin2d
.gg.stat.sbinNd
Scaled nD bin (i.e., log bins)
Parameters:
Name | Type | Description |
---|---|---|
columns | symbol | n column names |
binspecs | (symbol; number; number) | n triples of: width or count (w or c), argument, padding |
scales | dict | n scales -- see .gg.scale |
aggs | dict | see .st.a.* |
options | dict | null | null for defaults, see .st.sbinNd_i |
Returns:
Type | Description |
---|---|
dict | table -> transformed table |
See Also: .st.sbinNd
.gg.stat.summary
Compute 5-number summaries of a column for each distinct value of another column
Parameters:
Name | Type | Description |
---|---|---|
catcol | symbol | categorical column name |
numcol | symbol | continuous column name |
Returns:
Type | Description |
---|---|
dict | table -> transformed table |
See Also: .st.summary
.st.a.avg
Return a description of an avg
aggregation.
Note - the aggregation will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).
Parameter:
Name | Type | Description |
---|---|---|
col | symbol | column name to avg |
See Also: .st.a.custom
Example: An average aggregation of a column
.st.a.avg[`mycolumn]
Example: A count and avg aggregation
.st.a.count[] , .st.a.avg[`mycolumn]
.st.a.count
Return a count aggregation description. The output will be mapped
to a variable named count__
.
Returns:
Type | Description |
---|---|
dict |
Example: A count aggregation
.st.a.count[]
.st.a.custom
Return a description for a custom aggregation on a
table. The custom function should take a
list of the type of the column, and return a single
value (e.g. avg
, dev
, {count distinct x}
, etc)
Parameters:
Name | Type | Description |
---|---|---|
n | symbol | name of resulting column |
col | symbol | name of column to aggregate |
customF | fn | function to aggregate sublists of the column |
Returns:
Type | Description |
---|---|
dict |
Example: Custom average aggregator
.st.a.custom[`outputName__; `mycolumn; avg]
Example: Count and a custom aggregator count occurrences
.st.a.count[] , .st.a.custom[`output__; `mycolumn; {count where x = `something}]
.st.a.max
Return a description of a max aggregation.
Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).
Parameter:
Name | Type | Description |
---|---|---|
col | symbol | column name to max |
See Also: .st.a.custom
.st.a.min
Return a description of a min aggregation
Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).
Parameter:
Name | Type | Description |
---|---|---|
col | symbol | column name to min |
See Also: .st.a.custom
.st.a.sum
Return a description of a sum aggregation
Note - the output will be mapped to the column name. For an aggregation with an explicit output mapping (to avoid collisions with other aggregations on the same column, see .st.a.custom).
Parameter:
Name | Type | Description |
---|---|---|
col | symbol | column name to sum |
See Also: .st.a.custom
.st.bin1d
Perform a 1D binning on a table, performing all specified aggregations.
Parameters:
Name | Type | Description |
---|---|---|
col | symbol | column to bin |
val | (symbol; number) | width or count (`w or `c ) and argument pair |
aggs | dict | dictionary of aggregation descriptions (see .st.a.*) |
options | dict | null | see .st.bin2d |
table | table |
Returns:
Type | Description |
---|---|
table | binned and aggregated table |
Throws:
Type | Description |
---|---|
"column x not found" |
Example: Basic categorical bin with count aggregation
t: ([]x:45?5?`8; v:45?45);
.st.bin1d[`x; ::; .st.a.count[]; ::; t]
/=> x x_start__ x_end__ count__
/=> -----------------------------------
/=> akkihkkm akkihkkm dkphkccc 5
/=> dkphkccc dkphkccc fchdbpfd 11
/=> fchdbpfd fchdbpfd mbpcngkg 11
/=> mbpcngkg mbpcngkg pdhioofe 8
/=> pdhioofe pdhioofe 10
Example: Categorical bin with count and avg aggregations
.st.bin1d[`x; ::; .st.a.count[] , .st.a.custom[`myoutput__;`v;avg]; ::; t]
/=> x x_start__ x_end__ count__ myoutput__
/=> ----------------------------------------------
/=> biamifgg biamifgg ekeilfak 12 17.25
/=> ekeilfak ekeilfak obikddhi 4 22
/=> obikddhi obikddhi oebfende 11 28.63636
/=> oebfende oebfende pbaioapc 7 14.42857
/=> pbaioapc pbaioapc 11 21
Example: Numeric bin with count aggregation
.st.bin1d[`v; ::; .st.a.count[]; ::; t]
/=> v v_start__ v_end__ count__
/=> ----------------------------
/=> 0 0 3 5
/=> 3 3 6 3
/=> 6 6 9 4
/=> 9 9 12 2
/=> 12 12 15 1
/=> 15 15 18 5
/=> ...
Example: Custom numeric bin with count aggregation and centered output
.st.bin1d[`v; (`w;10;0); .st.a.count[]; enlist[`center]!enlist 1b; t]
// ^ 10-unit wide bins
// ^ center the output point
/=> v v_start__ v_end__ count__
/=> ----------------------------
/=> 5 0 10 13
/=> 15 10 20 9
/=> 25 20 30 8 s
/=> 35 30 40 10
/=> 45 40 50 5
.st.bin2d
Perform a 2D binning and all specified aggregations on the bins of a specified table.
If hexbins are requested, normalization and centering are disabled.
Parameters:
Name | Type | Description |
---|---|---|
columns | symbol[] | pair of column names |
xbins | (symbol; number) | width or count (`w or `c ) and argument |
ybins | (symbol; number) | width or count (`w or `c ) and argument |
mods | dict | aggregations to perform |
options | dict (norm: symbol; center: boolean; hex: boolean) | null | options for binning |
table | table |
Returns:
Type | Description |
---|---|
table |
Throws:
Type | Description |
---|---|
"column x not found" |
See Also: .st.bin1d
Example: Basic 2d binning using all defaults
.st.bin2d[`x`y; ::; ::; .st.a.count[]; ::; ([]x:til 45; y: til 45)]
Example: Bin with 40 x bins
.st.bin2d[`x`y; (`c;40;0); ::; .st.a.count[]; ::; ([]x:til 45; y: til 45)]
Example: Bin with centered bins
.st.bin2d[`x`y; ::; ::; .st.a.count[]; enlist[`center]!enlist 1b; ([]x:til 45; y: til 45)]
Example: Bin with x normalized by y
.st.bin2d[`x`y; ::; ::; .st.a.count[]; enlist[`norm]!enlist `x; ([]x:til 45; y: 45?`a`b`c)]
Example: Bin with custom sum aggregation on x
.st.bin2d[`x`y; ::; ::; .st.a.count[] , .st.a.custom[`newx; `x; sum]; ::; ([]x:til 45; y: 45?`a`b`c)]
Example: Bin using a by-clause and a bin
.st.bin2d[`x`y; ::; `by; .st.a.count[] , .st.a.custom[`newx; `x; sum]; ::; ([]x:til 45; y: 45?`a`b`c)]
.st.binNd
Perform an nD binning and all specified aggregations on the bins of a specified table.
Parameters:
Name | Type | Description |
---|---|---|
columns | symbol[] | list of column names to bin |
xbins | (symbol; number; number) | width or count (`w or `c ) and argument |
ybins | (symbol; number; number) | width or count (`w or `c ) and argument |
mods | dict | aggregations to perform |
options | dict | null | see .st.bin2d |
table | table |
Returns:
Type | Description |
---|---|
table |
Throws:
Type | Description |
---|---|
"column x not found" |
See Also: .st.bin1d
.st.factorial
Calculates the factorial of a number This uses floats, as longs overflow too quickly
Parameter:
Name | Type | Description |
---|---|---|
x | Number |
Returns:
Type | Description |
---|---|
float |
.st.gen.normal
Generate a normal distribution
Parameter:
Name | Type | Description |
---|---|---|
n | Long | The number of points to generate |
Returns:
Type | Description |
---|---|
Float[] | The random data points |
.st.lsqTable
Return a 1000-point sampling of the d-degree least squares fit of the x and y column of the given table
Parameters:
Name | Type | Description |
---|---|---|
x | symbol | column name |
y | symbol | column name |
d | long | degree (i.e., between 0-4) |
t | table | dict | table or .gg.tbl.ty instance |
Returns:
Type | Description |
---|---|
table | 1000-point sampling |
.st.lsqTableGrouped
Return a 1000-point sampling of the d-degree least squares fit of the x and y column of the given table
Parameters:
Name | Type | Description |
---|---|---|
x | symbol | column name |
y | symbol | column name |
g | symbol | group column name |
d | long | degree (i.e., between 0-4) |
t | table | dict | table or .gg.tbl.ty instance |
Returns:
Type | Description |
---|---|
table | 1000-point sampling |
.st.lsquares
Return the coefficients of the d-degree least-squares fit on the given table
Parameters:
Name | Type | Description |
---|---|---|
x | symbol | column |
y | symbol | column |
d | long | degree (i.e., between 0-4) |
table | table | dict | table or .gg.tbl.ty instance |
Returns:
Type | Description |
---|---|
number[] | coefficients |
.st.normalPDF
The probability density function of a normal distribution
Parameters:
Name | Type | Description |
---|---|---|
u | Number | The mean value |
v | Number | The variance |
x | Number | The independent variable |
Returns:
Type | Description |
---|---|
Number |
.st.outliers
Return the "outliers" component of a box-plot. All data points further than 1.5 times the interquartile range from the median are returned.
Parameters:
Name | Type | Description |
---|---|---|
catcol | symbol | categorical column |
numcol | symbol | numeric column |
table | table |
Returns:
Type | Description |
---|---|
table |
See Also: .st.summary
Example:
t : ([]x:45?5?`8; y:45?45);
.st.outliers[`x; `y; t]
/=> x y
/=> -----------
/=> mijpkecf 44
/=> kiggemin 39
.st.poissonPMF
The probability mass function of a poisson distribution
Parameters:
Name | Type | Description |
---|---|---|
l | Number | The mean value |
k | Number | The number of occurrences |
Returns:
Type | Description |
---|---|
Float | The probability of a given outcome |
.st.quantile
Perform a quantile transform on a numeric column
Parameters:
Name | Type | Description |
---|---|---|
x | symbol | numeric column name |
table | table |
Returns:
Type | Description |
---|---|
table |
Throws:
Type | Description |
---|---|
"column x not found" | |
"column x of type y not one of z" |
Example:
t:([]x:45?45);
.st.quantile[`x; t]
/=> x fvalue__
/=> -------------
/=> 0 0.01111111
/=> 0 0.03333333
/=> 1 0.05555556
/=> 1 0.07777778
/=> 1 0.1
/=> 2 0.1222222
/=> 2 0.1444444
/=> ...
.st.quartiles
Take the quartiles of column y of the table for each distinct column x value.
The output columns are the following:
- the first column has the same name as the given categorical column (x)
q1__
- first quartileq2__
- second quartileq3__
- third quartile
Parameters:
Name | Type | Description |
---|---|---|
x | symbol | |
y | symbol | |
table | table |
Returns:
Type | Description |
---|---|
table |
Throws:
Type | Description |
---|---|
"column x not found" | |
"column x of type y not one of z" |
See Also: .st.summary
Example:
t: ([]x:45?5?`8; y:45?45);
.st.quartiles[`x; `y; t]
/=> x q1__ q2__ q3__
/=> -----------------------
/=> npccjbfg 4 23 32.5
/=> kcjfooab 7.5 23.5 31
/=> jnmhejla 14 23 30
/=> iiphmkna 16 25.5 31
/=> gnlighkg 22 34 41
.st.rollup.col
Calculate a columns statistics if column is of type 1,4-10,12-19. Otherwise this will return empty stats dictionary
Parameter:
Name | Type | Description |
---|---|---|
col | list | The column in list format, which stats are to be calculated on. |
Returns:
Type | Description |
---|---|
dict | The stats keyed by statistical operation. |
.st.rollup.table
For a given table, construct the statistics on each column
Parameter:
Name | Type | Description |
---|---|---|
table | table | The (keyed) table which stats should be calculated. |
Returns:
Type | Description |
---|---|
dict | A dictionary keyed by column names containing each columns calculated stats. |
.st.sbin1d
Perform a scaled 1d bin on a table. The given scale is applied to the data before binning.
Parameters:
Name | Type | Description |
---|---|---|
col | symbol | |
val | (symbol; number) | width or count (`w or `c ) and arg for bin |
scale | dict | scale (see .gg.scale) |
aggs | dict | aggregations (see .st.a.*) |
options | dict | null | see .st.bin2d |
table | table |
Returns:
Type | Description |
---|---|
table |
See Also: .st.bin1d
.st.sbin2d
Perform a 2d binning and necessary aggregations.
Parameters:
Name | Type | Description |
---|---|---|
columns | symbol[] | pair of symbols to bin |
xbins | (symbol; number; number) | (`w or `c ; arg; padding) |
ybins | (symbol; number; number) | (`w or `c ; arg; padding) |
xscale | dict | scale for the x axis |
yscale | dict | scale for the y axis |
mods | dict | aggregations |
options | dict | null | see .st.bin2d |
table | table |
See Also: .st.bin1d
.st.sbinNd
Perform an nD binning using the given scales. Bins can be specified as in width or height. The padding to a bin is added to increase the range over which the bins are split. For example, categorical columns would likely specify a padding value of 1 so that bins (`c;5;1) are spaced on even numbers, rather than over (num distinct) % 5 intervals.
Parameters:
Name | Type | Description |
---|---|---|
columns | symbol[] | list of column names |
binDescr | (symbol; number; number)[] | bin description (`w or `c ; arg; padding) |
scales | dict[] | list of scales for each variable |
mods | dict | list of aggregations to perform |
options | dict | null | see .st.bin2d |
table | table |
See Also: .st.bin1d
.st.sbinNd_i
Perform an nD binning using the given scales. Bins can be specified as in width or height. The padding to a bin is added to increase the range over which the bins are split. For example, categorical columns would likely specify a padding value of 1 so that bins (`c;5;1) are spaced on even numbers, rather than over (num distinct) % 5 intervals.
Parameters:
Name | Type | Description |
---|---|---|
cs | symbol[] | list of column names |
descr | ((symbol; number; number) | null)[] | bin description (`w or `c ; arg; padding), or null to accept defaults |
scales | dict[] | list of scales for each variable |
mods | dict | list of aggregations to perform |
opts | dict | null | see .st.bin2d |
t | table |
Returns:
Type | Description |
---|---|
table | binned data |
Throws:
Type | Description |
---|---|
"column x not found" [!!] NOTE - this binning function should avoid duplicating data at all costs! |
See Also: .st.bin1d
.st.summary
Returns a five-number-summary-style report for each subset of the data split on distinct values of one column (x). The x column should be categorical, while the y column should be numeric/continuous.
The output columns are the following:
- the first column has the same name as the given categorical column (x)
q1__
- first quartileq2__
- second quartileq3__
- third quartilemin__
- minmax__
- maxmed__
- medianmean__
- average (avg)upper__
- upper hinge (1.5 * interquartile range from median)lower__
- lower hinge (1.5 * interquartile range from median)
Parameters:
Name | Type | Description |
---|---|---|
x | symbol | categorical column name |
y | symbol | continuous column name |
table | table |
Returns:
Type | Description |
---|---|
table |
Throws:
Type | Description |
---|---|
"column x not found" | |
"column x of type y not one of z" |
See Also: .st.outliers .st.quartiles
Example:
t:([]x:45?5?`8; y:45?45);
.st.summary[`x; `y; t]
/=> x q1__ q2__ q3__ min__ max__ med__ mean__ upper__ lower__
/=> ------------------------------------------------------------------
/=> npccjbfg 4 23 32.5 0 42 23 18.88889 42 0
/=> kcjfooab 7.5 23.5 31 1 42 23 20 42 1
/=> jnmhejla 14 23 30 7 33 22.5 21 33 7
/=> iiphmkna 16 25.5 31 13 41 20 24.2 41 13
/=> gnlighkg 22 34 41 17 42 30 29.66667 42 17