Utility functions¶
.ml Utility functions arange Evenly-spaced values within a range combs n linear combinations of k numbers df2tab kdb+ table from a pandas dataframe df2tabTimezone Pandas dataframe to kdb+ conversion handling dates/times/timezones eye Identity matrix iMax Index of maximum element of a list iMin Index of minimum element of a list linearSpace List of evenly-spaced values range Range of values shape Shape of a matrix tab2df Pandas dataframe from a q table trainTestSplit Split into training and test sets
The toolkit contains utility functions, used in many applications and not limited to categories such as statistics or preprocessing.
.ml.arange
¶
Evenly-spaced values
.ml.arange[start;end;step]
Where
start
is the start of the interval (inclusive)end
is the end of the interval (non-inclusive)step
is the spacing between values
returns a vector of evenly-spaced values between start and end in steps of length step
q).ml.arange[1;10;1]
1 2 3 4 5 6 7 8 9
q).ml.arange[6.25;10.5;0.05]
6.25 6.3 6.35 6.4 6.45 6.5 6.55 6.6 6.65 6.7 6.75 6.8 6.85 6.9 6.95 7 7.05 7...
.ml.combs
¶
Unique combinations of a vector or matrix
.ml.combs[n;degree]
Where
n
is the integer number of values required for combinationsdegree
is the degree of the combinations to be produced
returns the unique combinations of values from the data.
q).ml.combs[3;2]
0 1
0 2
1 2
q)show k:5?`1
`p`j`e`o`b
q)k .ml.combs[count k;4] / display values in combinations
p j e o
p j e b
p j o b
p e o b
j e o b
q)show m:(0 1 2;2 3 4;4 5 6;6 7 8)
0 1 2
2 3 4
4 5 6
6 7 8
q)m .ml.combs[count m;3]
0 1 2 2 3 4 4 5 6
0 1 2 2 3 4 6 7 8
0 1 2 4 5 6 6 7 8
2 3 4 4 5 6 6 7 8
.ml.df2tab
¶
Convert pandas dataframe to q table
.ml.df2tab[tab]
Where
tab
is an embedPy representation of a Pandas dataframe
returns tab
as a q table.
q)p)import pandas as pd
q)print t:.p.eval"pd.DataFrame({'fcol':[0.1,0.2,0.3,0.4,0.5],'jcol':[10,20,30,40,50]})"
fcol jcol
0 0.1 10
1 0.2 20
2 0.3 30
3 0.4 40
4 0.5 50
q).ml.df2tab t
fcol jcol
---------
0.1 10
0.2 20
0.3 30
0.4 40
0.5 50
q)print kt:t[`:set_index]`jcol
fcol
jcol
10 0.1
20 0.2
30 0.3
40 0.4
50 0.5
q).ml.df2tab kt
jcol| fcol
----| ----
10 | 0.1
20 | 0.2
30 | 0.3
40 | 0.4
50 | 0.5
Index columns This function assumes a single unnamed Python index column is to be removed. It returns an unkeyed table. All other variants of Python index columns map to q key columns. For example any instance with two or more indexes will map to two or more Python keys, while any named single-index Python column be associated with a q key in a keyed table.
Note
This function is a wrapper around .ml.df2tabTimezone
, conversions within this function will default to convert datetime.date
and datetime.time
types to foreign objects, numpy timezone types are converted to their UTC representation. These conversion choices have been made due to python related computational inefficiencies in converting to native q types and local-time representations respectively.
.ml.df2tabTimezone
¶
Convert a pandas dataframe containing datetime objects to a q table
.ml.df2tabTimezone[tab;local;qObj]
Where:
tab
is an embedPy representation of a Pandas dataframelocal
is a boolean indicating if timezone(tz) objects are to be converted to local time (1b) or UTC (0b)qObj
is a boolean indicating if python datetime.date/datetime.time objects are returned as q (1b) or foreign objects (0b)
Returns a q table
q)p)import pandas as pd
q)p)import datetime
q)p)import numpy as np
q)p)dtdf=pd.DataFrame(
{'time':[datetime.time(12, 10, 30,500),datetime.time(12, 13, 30,200)],
'timed':[datetime.timedelta(hours=-5),datetime.timedelta(seconds=1000)],
'datetime':[np.datetime64('2005-02-25T03:30'),np.datetime64('2015-12-22')]})
q)p)dtdf['dt_with_tz']=dtdf.datetime.dt.tz_localize('CET')
q)print dttab:.p.get[`dtdf]
time timed datetime dt_with_tz
0 12:10:30.000500 -1 days +19:00:00 2005-02-25 03:30:00 2005-02-25 03:30:00+01:00
1 12:13:30.000200 00:16:40 2015-12-22 00:00:00 2015-12-22 00:00:00+01:00
/ default behavior (tz -> UTC, time -> foreign)
q).ml.df2tabTimezone[dttab;0b;0b]
time timed datetime dt_with_tz
-----------------------------------------------------------------------------------------
foreign -0D05:00:00.000000000 2005.02.25D03:30:00.000000000 2005.02.25D02:30:00.000000000
foreign 0D00:16:40.000000000 2015.12.22D00:00:00.000000000 2015.12.21D23:00:00.000000000
/ default time conversion, local tz conversion
q).ml.df2tabTimezone[dttab;1b;0b]
time timed datetime dt_with_tz
-----------------------------------------------------------------------------------------
foreign -0D05:00:00.000000000 2005.02.25D03:30:00.000000000 2005.02.25D03:30:00.000000000
foreign 0D00:16:40.000000000 2015.12.22D00:00:00.000000000 2015.12.22D00:00:00.000000000
/ default tz conversion and conversion to q time
q).ml.df2tabTimezone[dttab;0b;1b]
time timed datetime dt_with_tz
------------------------------------------------------------------------------------------------------
0D12:10:30.000500000 -0D05:00:00.000000000 2005.02.25D03:30:00.000000000 2005.02.25D02:30:00.000000000
0D12:13:30.000200000 0D00:16:40.000000000 2015.12.22D00:00:00.000000000 2015.12.21D23:00:00.000000000
.ml.df2tab_tz
deprecated
The above function was previously defined as .ml.df2tab_tz
.
It is still callable but will be deprecated after version 3.0.
.ml.eye
¶
Create identity matrix
.ml.eye[n]
Where
n
is the width/height of identity matrix
returns an identity matrix of height/width n
.
q).ml.eye 5
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
.ml.iMax
¶
Index of maximum element of a list
.ml.iMax[array]
Where
array
is a numerical array of values
returns the index of the maximum element of the array
q)show a:8?5.
3.883438 4.96977 2.447749 3.253555 4.246108 4.54695 1.381171 2.273137
q).ml.iMax a
1
q)show b:8?100
23 8 12 24 6 36 68 37
q).ml.iMax b
6
.ml.imax
deprecated
The above function was previously defined as .ml.imax
.
It is still callable but will be deprecated after version 3.0.
.ml.iMin
¶
Index of maximum element of a list
.ml.iMin[array]
Where
array
is a numerical array of values
returns the index of the minimum element of the array
q)show a:8?10.
0.6916353 8.045142 7.619755 4.599266 0.3341879 6.43216 6.177459 4.751895
q).ml.iMin a
4
q)show b:8?50
22 45 3 22 3 5 40 26
q).ml.iMin b
2
.ml.imin
deprecated
The above function was previously defined as .ml.imin
.
It is still callable but will be deprecated after version 3.0.
.ml.linearSpace
¶
Array of evenly-spaced values
.ml.linearSpace[start;end;n]
Where
start
is the start of the interval (inclusive)end
is the end of the interval (inclusive)n
indicates how many spaces are to be created
returns a vector of n
evenly-spaced values between start
and end
.
q).ml.linearSpace[10;20;9]
10 11.25 12.5 13.75 15 16.25 17.5 18.75 20
q).ml.linearSpace[0.5;15.25;12]
0.5 1.840909 3.181818 4.522727 5.863636 7.204545 8.545455 9.886364 11.22727 1..
.ml.linspace
deprecated
The above function was previously defined as .ml.linspace
.
It is still callable but will be deprecated after version 3.0.
.ml.range
¶
Range of values
.ml.range[array]
Where
array
is a numerical array
returns the range of its values.
q).ml.range 1000?100000f
99742.37
q)show mat:(2 2#4?1f)
0.04492896 0.1786355
0.9694828 0.8964098
q).ml.range mat
0.9245539 0.7177742
.ml.shape
¶
Shape of a matrix
.ml.shape[matrix]
Where
matrix
is a matrix of values
returns its shape as a list of dimensions.
q).ml.shape 10
`long$()
q).ml.shape enlist 10
,1
q).ml.shape til 10
,10
q).ml.shape enlist til 10
1 10
q).ml.shape 2 5#til 10
2 5
q).ml.shape 2 3 4#til 24
2 3 4
q).ml.shape ([]c1:til 10;c2:0)
10 2
Behavior of .ml.shape
is undefined for ragged/jagged arrays.
.ml.tab2df
¶
Convert q table to Pandas dataframe
.ml.tab2df[tab]
Where tab
is a table
returns a Pandas dataframe.
q)n:5
/ q table for input
q)table:([]x:n?10000f;x1:1+til n;x2:reverse til n;x3:n?100f)
x x1 x2 x3
-----------------------
2631.44 1 4 78.71917
1118.109 2 3 80.09356
3250.627 3 2 16.71013
// Convert to pandas dataframe and show it is an embedPy object
q)show pdf:.ml.tab2df[table]
{[f;x]embedPy[f;x]}[foreign]enlist
// Display the Python form of the dataframe
q)print pdf
x x1 x2 x3
0 2631.439704 1 4 78.719172
1 1118.109056 2 3 80.093563
2 3250.627243 3 2 16.710134
.ml.trainTestSplit
¶
Split into training and test sets
.ml.trainTestSplit[data;target;size]
Where
data
is a matrix, table or listtarget
is a vector of target values the same count as datasize
is the percentage size of the testing set
returns a dictionary containing the data matrix and target values, split into a training and testing set according to the percentage size
of the data to be contained in the test set.
q)mat:(30 20)#1000?10f
q)target:rand each 30#0b
q).ml.trainTestSplit[mat;target;0.2] / split the data such that 20% is contained in the test set
xtrain| (2.02852 2.374546 1.083376 2.59378 6.698505 6.675959 4.120228 2.63468..
ytrain| 110010100101111001110000b
xtest | (8.379916 8.986609 7.06074 2.067817 5.468488 4.103195 0.1590803 0.259..
ytest | 000001b
q)tab:([]30?1f;30?`1;30?10)
q).ml.trainTestSplit[tab;target;0.2]
xtrain| +`x`x1`x2!(0.1659182 0.5316555 0.9658597 0.6659117 0.4921318 0.580703..
ytrain| 0.4449418 0.6637015 0.77852 0.8229043 0.5678825 0.9534722 0.2448434 0..
xtest | +`x`x1`x2!(0.6913239 0.3921862 0.2904501 0.6536423 0.6517715 0.961030..
ytest | 0.9861457 0.752895 0.2695986 0.122979 0.4412847 0.4952119
q)list:asc 30?1f
q).ml.trainTestSplit[list;target;0.2]
xtrain| 0.4251052 0.6419072 0.5701215 0.4231011 0.327041 0.1573152 0.3414573 ..
ytrain| 0.5029018 0.05230331 0.628313 0.5766565 0.6314705 0.3266584 0.9624403..
xtest | 0.3692275 0.4192985 0.1573064 0.9121564 0.28237 0.07992544
ytest | 0.3821462 0.9177309 0.3572827 0.1110881 0.9807582 0.5132051
.ml.traintestsplit
deprecated
The above function was previously defined as .ml.traintestsplit
.
It is still callable but will be deprecated after version 3.0.