PyKX Introduction Notebook¶

The purpose of this notebook is to provide an introduction to the capabilities and functionality made available to you with PyKX.

To follow along please download this notebook using the following 'link.'

This Notebook is broken into the following sections

How to import PyKX
The basic data structures of PyKX
Accessing and creating PyKX objects
Running analytics on objects in PyKX

Welcome to PyKX!¶

PyKX is a Python library built and maintained for interfacing seamlessly with the worlds fastest time-series database technology kdb+ and it's underlying vector programming language q.

It's aim is to provide you and all Python data-engineers and data-scientist with an interface to efficiently apply analytics on large volumes of on-disk and in-memory data, in a fraction of the time of competitor libraries.

How to import PyKX¶

To access PyKX and it's functions import it in your Python code as follows

In [2]:

Copied!

import pykx as kx
kx.q.system.console_size = [10, 80]
import pykx as kx
kx.q.system.console_size = [10, 80]

The shortening of the import name to kx is done for readability of code that uses PyKX and is the intended standard for the library. As such we recommend that you always use import pykx as kx when using the library.

Below we load additional libraries used through this notebook.

In [3]:

Copied!

import numpy as np
import pandas as pd
import numpy as np
import pandas as pd

The basic data structures of PyKX¶

Central to your interaction with PyKX are the various data types that are supported by the library, fundamentally PyKX is built atop a fully featured functional programming language q which provides small footprint data structures that can be used in analytic calculations and the creation of highly performant databases. The types we show below are generated from Python equivalent types but as you will see through this notebook

In this section we will describe the basic elements which you will come in contact with as you traverse the library and explain why/how they are different.

PyKX Atomic Types¶

In PyKX an atom denotes a single irreducible value of a specific type, for example you may come across pykx.FloatAtom or pykx.DateAtom objects generated as follows which may have been generated as follows from an equivalent Pythonic representation.

In [4]:

Copied!

kx.FloatAtom(1.0)
kx.FloatAtom(1.0)

Out[4]:

pykx.FloatAtom(pykx.q('1f'))

In [5]:

Copied!

from datetime import date
kx.DateAtom(date(2020, 1, 1))
from datetime import date
kx.DateAtom(date(2020, 1, 1))

Out[5]:

pykx.DateAtom(pykx.q('2020.01.01'))

PyKX Vector Types¶

Similar to atoms, vectors are a data structure composed of a collection of multiple elements of a single specified type. These objects in PyKX along with lists described below form the basis for the majority of the other important data structures that you will encounter including dictionaries and tables.

Typed vector objects provide significant benefits when it comes to the applications of analytics over Python lists for example. Similar to Numpy, PyKX gains from the underlying speed of it's analytic engine when operating on these strictly typed objects.

Vector type objects are always 1-D and as such are/can be indexed along a single axis.

In the following example we are creating PyKX vectors from common Python equivalent numpy and pandas objects.

In [6]:

Copied!

kx.IntVector(np.array([1, 2, 3, 4], dtype=np.int32))
kx.IntVector(np.array([1, 2, 3, 4], dtype=np.int32))

Out[6]:

pykx.IntVector(pykx.q('1 2 3 4i'))

In [7]:

Copied!

kx.toq(pd.Series([1, 2, 3, 4]))
kx.toq(pd.Series([1, 2, 3, 4]))

Out[7]:

pykx.LongVector(pykx.q('1 2 3 4'))

PyKX Lists¶

A List in PyKX can loosely be described as an untyped vector object. Unlike vectors which are optimised for the performance of analytics, lists are more commonly used for storing reference information or matrix data.

Unlike vector objects which are by definition 1-D in shape, lists can be ragged N-Dimensional objects. This makes them useful for the storage of some complex data structures but limits their performance when dealing with data-access/data modification tasks.

In [8]:

Copied!

kx.List([[1, 2, 3], [1.0, 1.1, 1.2], ['a', 'b', 'c']])
kx.List([[1, 2, 3], [1.0, 1.1, 1.2], ['a', 'b', 'c']])

Out[8]:

pykx.List(pykx.q('
1 2   3  
1 1.1 1.2
a b   c  
'))

PyKX Dictionaries¶

A dictionary in PyKX is defined as a mapping between a direct key-value mapping, the list of keys and values to which they are associated must have the same count. While it can be considered as a key-value pair, it is physically stored as a pair of lists.

In [9]:

Copied!

print(kx.Dictionary({'x': [1, 2, 3], 'x1': np.array([1, 2, 3])}))
print(kx.Dictionary({'x': [1, 2, 3], 'x1': np.array([1, 2, 3])}))

x | 1 2 3
x1| 1 2 3

PyKX Tables¶

Tables in PyKX are a first-class typed entity which live in memory. They can be fundamentally described as a collection of named columns implemented as a dictionary. This mapping construct means that tables in PyKX are column-oriented which makes analytic operations on specified columns much faster than would be the case for a relational database equivalent.

Tables in PyKX come in many forms but the key table types are as follows

pykx.Table
pykx.KeyedTable
pykx.SplayedTable
pykx.PartitionedTable

In this section we will deal only with the first two of these which constitute specifically the in-memory data table types. As will be discussed in later sections Splayed and Partitioned tables are memory-mapped on-disk data structures, these are derivations of the pykx.Table and pykx.KeyedTable type objects.

`pykx.Table`¶

In [10]:

Copied!

print(kx.Table([[1, 2, 'a'], [2, 3, 'b'], [3, 4, 'c']], columns = ['col1', 'col2', 'col3']))
print(kx.Table([[1, 2, 'a'], [2, 3, 'b'], [3, 4, 'c']], columns = ['col1', 'col2', 'col3']))

col1 col2 col3
--------------
1    2    a   
2    3    b   
3    4    c

In [11]:

Copied!

print(kx.Table(data = {'col1': [1, 2, 3], 'col2': [2 , 3, 4], 'col3': ['a', 'b', 'c']}))
print(kx.Table(data = {'col1': [1, 2, 3], 'col2': [2 , 3, 4], 'col3': ['a', 'b', 'c']}))

col1 col2 col3
--------------
1    2    a   
2    3    b   
3    4    c

`pykx.KeyedTable`¶

In [12]:

Copied!

kx.Table(data = {'x': [1, 2, 3], 'x1': [2, 3, 4], 'x2': ['a', 'b', 'c']}).set_index(['x'])
kx.Table(data = {'x': [1, 2, 3], 'x1': [2, 3, 4], 'x2': ['a', 'b', 'c']}).set_index(['x'])

Out[12]:

	x1	x2
x
1	2	a
2	3	b
3	4	c

Other Data Types¶

The above types outline the majority of the important type structures in PyKX but there are many others which you will encounter as you use the library, below we have outlined some of the important ones that you will run into through the rest of this notebook.

`pykx.Lambda`¶

A pykx.Lambda is the most basic kind of function within PyKX. They take between 0 and 8 parameters and are the building blocks for most analytics written by users when interacting with data from PyKX.

In [13]:

Copied!

pykx_lambda = kx.q('{x+y}')
type(pykx_lambda)
pykx_lambda = kx.q('{x+y}')
type(pykx_lambda)

Out[13]:

pykx.wrappers.Lambda

In [14]:

Copied!

pykx_lambda(1, 2)
pykx_lambda(1, 2)

Out[14]:

pykx.LongAtom(pykx.q('3'))

`pykx.Projection`¶

Similar to functools.partial, functions in PyKX can have some of their parameters fixed in advance, resulting in a new function, which is called a projection. When this projection is called, the fixed parameters are no longer required, and cannot be provided.

If the original function had n total parameters, and it had m provided, the result would be a function (projection) that requires a user to input n-m parameters.

In [15]:

Copied!

projection = kx.q('{x+y}')(1)
projection
projection = kx.q('{x+y}')(1)
projection

Out[15]:

pykx.Projection(pykx.q('{x+y}[1]'))

In [16]:

Copied!

projection(2)
projection(2)

Out[16]:

pykx.LongAtom(pykx.q('3'))

Accessing and creating PyKX objects¶

Now that we have seen some of the PyKX object types that you will encounter, practically speaking how will they be created in real-world scenarios?

Creating PyKX objects from Pythonic data types¶

One of the most common ways that PyKX data is generated is through conversions from equivalent Pythonic data types. PyKX natively supports conversions to and from the following common Python data formats.

Python
Numpy
Pandas
PyArrow

In each of the above cases generation of PyKX objects is facilitated through the use of the kx.toq PyKX function.

In [17]:

Copied!

pydict = {'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': 2}
kx.toq(pydict)
pydict = {'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': 2}
kx.toq(pydict)

Out[17]:



a	1 2 3
b	`a`b`c
c	2

In [18]:

Copied!

nparray = np.array([1, 2, 3, 4], dtype = np.int32)
kx.toq(nparray)
nparray = np.array([1, 2, 3, 4], dtype = np.int32)
kx.toq(nparray)

Out[18]:

pykx.IntVector(pykx.q('1 2 3 4i'))

In [19]:

Copied!

pdframe = pd.DataFrame(data = {'a':[1, 2, 3], 'b': ['a', 'b', 'c']})
kx.toq(pdframe)
pdframe = pd.DataFrame(data = {'a':[1, 2, 3], 'b': ['a', 'b', 'c']})
kx.toq(pdframe)

Out[19]:

	a	b

0	1	a
1	2	b
2	3	c

Random data generation¶

PyKX provides users with a module for the creation of random data of user specified PyKX types or their equivalent Python types. The creation of random data is useful in prototyping analytics and is used extensively within our documentation when creating test examples.

As a first example you can generate a list of 1,000,000 random floating point values between 0 and 1 as follows

In [20]:

Copied!

kx.random.random(1000000, 1.0)
kx.random.random(1000000, 1.0)

Out[20]:

pykx.FloatVector(pykx.q('0.3927524 0.5170911 0.5159796 0.4066642 0.1780839 0.3017723 0.785033 0.534709..'))

If instead you wish to choose values randomly from a list, this can be facilitated by using the list as the second argument to your function

In [21]:

Copied!

kx.random.random(5, [kx.LongAtom(1), ['a', 'b', 'c'], np.array([1.1, 1.2, 1.3])])
kx.random.random(5, [kx.LongAtom(1), ['a', 'b', 'c'], np.array([1.1, 1.2, 1.3])])

Out[21]:

pykx.List(pykx.q('
1.1 1.2 1.3
1
1.1 1.2 1.3
1
`a`b`c
'))

Random data does not only come in 1-Dimensional forms however and modifications to the first argument to be a list allow you to create multi-Dimensional PyKX Lists. The below examples are additionally using a PyKX trick where nulls/infinities can be used to generate random data across the full allowable range

In [22]:

Copied!

kx.random.random([2, 5], kx.GUIDAtom.null)
kx.random.random([2, 5], kx.GUIDAtom.null)

Out[22]:

pykx.List(pykx.q('
9b19ab9c-b26d-d6b3-a8fa-267ba0620848 d8d6c050-964e-6247-e2cd-bf9435389b9a 1c4..
a68f5b00-754e-9863-04aa-8b59cc4e3122 72969cc8-4445-451b-9266-7770a60c3120 0c7..
'))

In [23]:

Copied!

kx.random.random([2, 3, 4], kx.IntAtom.inf)
kx.random.random([2, 3, 4], kx.IntAtom.inf)

Out[23]:

pykx.List(pykx.q('
1837510540 373968399  35818431  1421474592  424239201  1727064393 250148680 1..
1566069007 1773121422 2104411811 1441846567 103906494  315107819  931560883  ..
'))

Finally, users can set the seed for the random data generation explicitly allowing users to have consistency over the generated objects. This can be completed globally or for individual function calls

In [24]:

Copied!

kx.random.seed(10)
kx.random.random(10, 2.0)
kx.random.seed(10)
kx.random.random(10, 2.0)

Out[24]:

pykx.FloatVector(pykx.q('0.1782082 1.669039 0.7243899 1.999868 0.7675971 1.723838 0.1836728 0.5061767 ..'))

In [25]:

Copied!

kx.random.random(10, 2.0, seed = 10)
kx.random.random(10, 2.0, seed = 10)

Out[25]:

pykx.FloatVector(pykx.q('0.1782082 1.669039 0.7243899 1.999868 0.7675971 1.723838 0.1836728 0.5061767 ..'))

Running q code to generate data¶

As mentioned in the introduction PyKX provides an entrypoint to the vector programming language q, as such users of PyKX can execute q code directly via PyKX within a Python session. This is facilitated through use of calls to kx.q.

Create some q data:

In [26]:

Copied!

kx.q('0 1 2 3 4')
kx.q('0 1 2 3 4')

Out[26]:

pykx.LongVector(pykx.q('0 1 2 3 4'))

In [27]:

Copied!

kx.q('([idx:desc til 5]col1:til 5;col2:5?1f;col3:5?`2)')
kx.q('([idx:desc til 5]col1:til 5;col2:5?1f;col3:5?`2)')

Out[27]:

	col1	col2	col3
idx
4	0	0.8619188	ol
3	1	0.09183638	mg
2	2	0.2530883	cm
1	3	0.2504566	cc
0	4	0.7517286	jg

Apply arguments to a user specified function x+y

In [28]:

Copied!

kx.q('{x+y}', kx.LongAtom(1), kx.LongAtom(2))
kx.q('{x+y}', kx.LongAtom(1), kx.LongAtom(2))

Out[28]:

pykx.LongAtom(pykx.q('3'))

Read data from a CSV file¶

A lot of data that you run into for data analysis tasks comes in the form of CSV files, PyKX similar to Pandas provides a CSV reader called via kx.q.read.csv, in the following cell we will create a CSV to be read in using PyKX

In [29]:

Copied!





import csv

with open('pykx.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    field = ["name", "age", "height", "country"]
    
    writer.writerow(field)
    writer.writerow(["Oladele Damilola", "40", "180.0", "Nigeria"])
    writer.writerow(["Alina Hricko", "23", "179.2", "Ukraine"])
    writer.writerow(["Isabel Walter", "50", "179.5", "United Kingdom"])
import csv

with open('pykx.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    field = ["name", "age", "height", "country"]
    
    writer.writerow(field)
    writer.writerow(["Oladele Damilola", "40", "180.0", "Nigeria"])
    writer.writerow(["Alina Hricko", "23", "179.2", "Ukraine"])
    writer.writerow(["Isabel Walter", "50", "179.5", "United Kingdom"])

In [30]:

Copied!

kx.q.read.csv('pykx.csv', types = {'age': kx.LongAtom, 'country': kx.SymbolAtom})
kx.q.read.csv('pykx.csv', types = {'age': kx.LongAtom, 'country': kx.SymbolAtom})

Out[30]:

	name	age	height	country

0	"Oladele Damilola"	40	180e	Nigeria
1	"Alina Hricko"	23	179.2e	Ukraine
2	"Isabel Walter"	50	179.5e	United Kingdom

In [31]:

Copied!

import os
os.remove('pykx.csv')
import os
os.remove('pykx.csv')

Querying external Processes via IPC¶

One of the most common usage patterns in organisations with access to data in kdb+/q you will encounter is to query this data from an external server process infrastructure. In the example below we assume that you have q installed in addition to PyKX, see here to install q alongside the license access for PyKX.

First we set up a q/kdb+ server setting it on port 5050 and populating it with some data in the form of a table tab

In [32]:

Copied!





import subprocess
import time

try:
    with kx.PyKXReimport():
        proc = subprocess.Popen(
            ('q', '-p', '5000')
        )
    time.sleep(2)
except:
    raise kx.QError('Unable to create q process on port 5000')
import subprocess
import time

try:
    with kx.PyKXReimport():
        proc = subprocess.Popen(
            ('q', '-p', '5000')
        )
    time.sleep(2)
except:
    raise kx.QError('Unable to create q process on port 5000')

Once a q process is available you can establish a connection to it for synchronous query execution as follows

In [33]:

Copied!

conn = kx.SyncQConnection(port = 5000)
conn = kx.SyncQConnection(port = 5000)

You can now run q commands against the q server

In [34]:

Copied!

conn('tab:([]col1:100?`a`b`c;col2:100?1f;col3:100?0Ng)')
conn('select from tab where col1=`a')
conn('tab:([]col1:100?`a`b`c;col2:100?1f;col3:100?0Ng)')
conn('select from tab where col1=`a')

Out[34]:

	col1	col2	col3

0	a	0.01974141	ddb87915-b672-2c32-a6cf-296061671e9d
1	a	0.5611439	580d8c87-e557-0db1-3a19-cb3a44d623b1
2	a	0.8685452	2d948578-e9d6-79a2-8207-9df7a71f0b3b
3	a	0.3460797	cddeceef-9ee9-3847-9172-3e3d7ab39b26
4	a	0.5046331	1c22a468-9492-2173-9e4f-9003a23d02b7
5	a	0.765905	5e9cd21b-88c5-bbf5-7215-6409e115a2a4
6	a	0.8794685	3462beab-42ee-ccad-989b-8d69f070dffc
7	a	0.02487862	bc150163-c551-0eba-8871-9767f5c0e3d5
...	...	...	...
36	a	0.9929108	03a9b290-95c8-c3b8-fb9a-9ac9874763b8

37 rows × 3 columns

Or use the PyKX query API

In [35]:

Copied!

conn.qsql.select('tab', where=['col1=`a', 'col2<0.3'])
conn.qsql.select('tab', where=['col1=`a', 'col2<0.3'])

Out[35]:

	col1	col2	col3

0	a	0.01974141	ddb87915-b672-2c32-a6cf-296061671e9d
1	a	0.02487862	bc150163-c551-0eba-8871-9767f5c0e3d5
2	a	0.2073435	ee853957-d502-d30d-5945-bf8c97022332
3	a	0.2188574	d9a3e171-b1cf-0271-507a-0fba0b52e6ff
4	a	0.1451855	ea4d0269-375c-d73b-96f0-6bb6334ca423
5	a	0.1497004	1cce6bdd-e34b-ba4f-8c01-31d098d81221
6	a	0.166486	6417d4b3-3fc6-e35a-1c34-8c5c3327b1e8
7	a	0.2643322	f294c3cb-a6da-e15d-c8e0-3a848d2abf10
8	a	0.07841939	020715aa-8ffa-e1d3-9c68-3ad7919d4f5e

Or use PyKX's context interface to run SQL server side if it's available to you

In [36]:

Copied!

conn('\l s.k_')
conn.sql('SELECT * FROM tab where col2>=0.5')
conn('\l s.k_')
conn.sql('SELECT * FROM tab where col2>=0.5')

Out[36]:

	col1	col2	col3

0	a	0.5611439	580d8c87-e557-0db1-3a19-cb3a44d623b1
1	a	0.8685452	2d948578-e9d6-79a2-8207-9df7a71f0b3b
2	b	0.7716917	52cb20d9-f12c-9963-2829-3c64d8d8cb14
3	a	0.5046331	1c22a468-9492-2173-9e4f-9003a23d02b7
4	c	0.6014692	7ea4d431-4dec-3017-3d13-cc9ef7f1c0ee
5	c	0.5000071	782c5346-f5f7-b90e-c686-8d41fa85233b
6	c	0.8392881	245f5516-0cb8-391a-e1e5-fadddc8e54ba
7	b	0.5938637	e30bab29-2df0-3fb0-535f-58d1e7bd83c0
...	...	...	...
55	b	0.8236115	f2c41bca-67df-aa6c-4730-bca38cbd6825

56 rows × 3 columns

Finally the q server used for this demonstration can be shut down

In [37]:

Copied!

proc.kill()
proc.kill()

Running analytics on objects in PyKX¶

Like many Python libraries including Numpy and Pandas PyKX provides a number of ways that it's data can be used with analytics defined internal to the library and which you have self generated.

Using in-built methods on PyKX Vectors¶

When you are interacting with PyKX Vectors you may wish to gain insights into these objects through the application of basic analytics such as calculation of the mean/median/mode of the vector

In [38]:

Copied!

q_vector = kx.random.random(1000, 10.0)
q_vector = kx.random.random(1000, 10.0)

In [39]:

Copied!

q_vector.mean()
q_vector.mean()

Out[39]:

pykx.FloatAtom(pykx.q('4.984157'))

In [40]:

Copied!

q_vector.max()
q_vector.max()

Out[40]:

pykx.FloatAtom(pykx.q('9.998212'))

The above is useful for basic analysis but will not be sufficient for more bespoke analytics on these vectors, to allow you more control over the analytics run you can also use the apply method.

In [41]:

Copied!

def bespoke_function(x, y):
    return x*y

q_vector.apply(bespoke_function, 5)
def bespoke_function(x, y):
    return x*y

q_vector.apply(bespoke_function, 5)

Out[41]:

pykx.FloatVector(pykx.q('31.74132 38.3376 46.40922 10.17963 38.73944 48.33864 41.12562 45.44382 32.290..'))

Using in-built methods on PyKX Tables¶

In addition to the vector processing capabilities of PyKX your ability to operate on Tabular structures is also important. Highlighted in greater depth within the Pandas-Like API documentation here these methods allow you to apply functions and gain insights into your data in a way that is familiar.

In the below example you will use combinations of the most commonly used elements of this Table API operating on the following table

In [42]:

Copied!





N = 1000000
example_table = kx.Table(data = {
    'sym' : kx.random.random(N, ['a', 'b', 'c']),
    'col1' : kx.random.random(N, 10.0),
    'col2' : kx.random.random(N, 20)
    }
)
example_table
N = 1000000
example_table = kx.Table(data = {
    'sym' : kx.random.random(N, ['a', 'b', 'c']),
    'col1' : kx.random.random(N, 10.0),
    'col2' : kx.random.random(N, 20)
    }
)
example_table

Out[42]:

	sym	col1	col2

0	b	7.782944	6
1	c	0.5899977	17
2	c	2.580528	8
3	b	5.651351	10
4	b	2.336329	11
5	b	2.87167	17
6	c	9.705893	9
7	a	5.729889	8
...	...	...	...
999999	c	8.862285	6

1,000,000 rows × 3 columns

You can search for and filter data within your tables using loc similarly to how this is achieved by Pandas as follows

In [43]:

Copied!

example_table.loc[example_table['sym'] == 'a']
example_table.loc[example_table['sym'] == 'a']

Out[43]:

	sym	col1	col2

0	a	5.729889	8
1	a	4.396508	13
2	a	0.7636906	19
3	a	9.904306	17
4	a	1.439738	10
5	a	2.898631	19
6	a	2.360396	2
7	a	1.932728	12
...	...	...	...
332823	a	6.653308	18

332,824 rows × 3 columns

This behavior also is incorporated when retrieving data from a table through the __get__ method as you can see here

In [44]:

Copied!

example_table[example_table['sym'] == 'b']
example_table[example_table['sym'] == 'b']

Out[44]:

	sym	col1	col2

0	b	7.782944	6
1	b	5.651351	10
2	b	2.336329	11
3	b	2.87167	17
4	b	2.917054	2
5	b	7.093562	18
6	b	1.715391	10
7	b	4.231884	0
...	...	...	...
333014	b	9.361253	17

333,015 rows × 3 columns

You can additionally set the index columns of the table, when dealing with PyKX tables this converts the table from a pykx.Table object to a pykx.KeyedTable object

In [45]:

Copied!

example_table.set_index('sym')
example_table.set_index('sym')

Out[45]:

	col1	col2
sym
b	7.782944	6
c	0.5899977	17
c	2.580528	8
b	5.651351	10
b	2.336329	11
b	2.87167	17
c	9.705893	9
a	5.729889	8
...	...	...
c	8.862285	6

1,000,000 rows × 3 columns

Additional to basic data manipulation such as index setting you also get access to analytic capabilities such as the application of basic data manipulation operations such as mean and median as demonstrated here

In [46]:

Copied!

print('mean:')
print(example_table.mean(numeric_only = True))

print('median:')
print(example_table.median(numeric_only = True))
print('mean:')
print(example_table.mean(numeric_only = True))

print('median:')
print(example_table.median(numeric_only = True))

mean:
col1| 4.998412
col2| 9.497452
median:
col1| 4.996685
col2| 9

You can make use of the groupby method which groups the PyKX tabular data which can then be used for analytic application.

In your first example let's start by grouping the dataset based on the sym column and then calculating the mean for each column based on their sym

In [47]:

Copied!

example_table.groupby('sym').mean()
example_table.groupby('sym').mean()

Out[47]:

	col1	col2
sym
a	5.00519	9.49375
b	5.000742	9.501077
c	4.989338	9.497527

As an extension to the above groupby you can now consider a more complex example which is making use of numpy to run some calculations on the PyKX data, you will see later that this can be simplified further in this specific use-case

In [48]:

Copied!

def apply_func(x):
    nparray = x.np()
    return np.sqrt(nparray).mean()

example_table.groupby('sym').apply(apply_func)
def apply_func(x):
    nparray = x.np()
    return np.sqrt(nparray).mean()

example_table.groupby('sym').apply(apply_func)

Out[48]:

	col1	col2
sym
a	2.109397	2.859095
b	2.108571	2.860037
c	2.105694	2.859527

Time-series specific joining of data can be completed using merge_asof joins. In this example a number of tables with temporal information namely a trades and quotes table

In [49]:

Copied!





trades = kx.Table(data={
    "time": [
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.030"),
        pd.Timestamp("2016-05-25 13:30:00.041"),
        pd.Timestamp("2016-05-25 13:30:00.048"),
        pd.Timestamp("2016-05-25 13:30:00.049"),
        pd.Timestamp("2016-05-25 13:30:00.072"),
        pd.Timestamp("2016-05-25 13:30:00.075")
    ],
    "ticker": [
       "GOOG",
       "MSFT",
       "MSFT",
       "MSFT",
       "GOOG",
       "AAPL",
       "GOOG",
       "MSFT"
   ],
   "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
   "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
quotes = kx.Table(data={
   "time": [
       pd.Timestamp("2016-05-25 13:30:00.023"),
       pd.Timestamp("2016-05-25 13:30:00.038"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048")
   ],
   "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
   "price": [51.95, 51.95, 720.77, 720.92, 98.0],
   "quantity": [75, 155, 100, 100, 100]
})

print('trades:')
display(trades)
print('quotes:')
display(quotes)
trades = kx.Table(data={
    "time": [
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.030"),
        pd.Timestamp("2016-05-25 13:30:00.041"),
        pd.Timestamp("2016-05-25 13:30:00.048"),
        pd.Timestamp("2016-05-25 13:30:00.049"),
        pd.Timestamp("2016-05-25 13:30:00.072"),
        pd.Timestamp("2016-05-25 13:30:00.075")
    ],
    "ticker": [
       "GOOG",
       "MSFT",
       "MSFT",
       "MSFT",
       "GOOG",
       "AAPL",
       "GOOG",
       "MSFT"
   ],
   "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
   "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
quotes = kx.Table(data={
   "time": [
       pd.Timestamp("2016-05-25 13:30:00.023"),
       pd.Timestamp("2016-05-25 13:30:00.038"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048")
   ],
   "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
   "price": [51.95, 51.95, 720.77, 720.92, 98.0],
   "quantity": [75, 155, 100, 100, 100]
})

print('trades:')
display(trades)
print('quotes:')
display(quotes)

trades:

	time	ticker	bid	ask

0	2016.05.25D13:30:00.023000000	GOOG	720.5	720.93
1	2016.05.25D13:30:00.023000000	MSFT	51.95	51.96
2	2016.05.25D13:30:00.030000000	MSFT	51.97	51.98
3	2016.05.25D13:30:00.041000000	MSFT	51.99	52f
4	2016.05.25D13:30:00.048000000	GOOG	720.5	720.93
5	2016.05.25D13:30:00.049000000	AAPL	97.99	98.01
6	2016.05.25D13:30:00.072000000	GOOG	720.5	720.88
7	2016.05.25D13:30:00.075000000	MSFT	52.01	52.03

quotes:

	time	ticker	price	quantity

0	2016.05.25D13:30:00.023000000	MSFT	51.95	75
1	2016.05.25D13:30:00.038000000	MSFT	51.95	155
2	2016.05.25D13:30:00.048000000	GOOG	720.77	100
3	2016.05.25D13:30:00.048000000	GOOG	720.92	100
4	2016.05.25D13:30:00.048000000	AAPL	98f	100

When applying the asof join you can additionally used named arguments to ensure that it is possible to make a distinction between the tables that the columns originate. In this case suffixing with _trades and _quotes

In [50]:

Copied!

trades.merge_asof(quotes, on='time', suffixes=('_trades', '_quotes'))
trades.merge_asof(quotes, on='time', suffixes=('_trades', '_quotes'))

Out[50]:

	time	ticker_trades	bid	ask	ticker_quotes	price	quantity

0	2016.05.25D13:30:00.023000000	GOOG	720.5	720.93	MSFT	51.95	75
1	2016.05.25D13:30:00.023000000	MSFT	51.95	51.96	MSFT	51.95	75
2	2016.05.25D13:30:00.030000000	MSFT	51.97	51.98	MSFT	51.95	75
3	2016.05.25D13:30:00.041000000	MSFT	51.99	52f	MSFT	51.95	155
4	2016.05.25D13:30:00.048000000	GOOG	720.5	720.93	AAPL	98f	100
5	2016.05.25D13:30:00.049000000	AAPL	97.99	98.01	AAPL	98f	100
6	2016.05.25D13:30:00.072000000	GOOG	720.5	720.88	AAPL	98f	100
7	2016.05.25D13:30:00.075000000	MSFT	52.01	52.03	AAPL	98f	100

Using PyKX/q native functions¶

While use of the Pandas-Like API and methods provided off PyKX Vectors provides an effective method of applying analytics on PyKX data the most efficient and performant way you can run analytics on your data is through the use of the PyKX/q primitives which are available through the kx.q module.

These include functionality for the calculation of moving averages, application of asof/window joins, column reversal etc. A full list of the available functions and some examples of their usage can be found here.

Here are a few examples of usage of how you can use these functions, broken into sections for convenience

Mathematical functions¶

mavg¶

Calculate a series of average values across a list using a rolling window

In [51]:

Copied!

kx.q.mavg(10, kx.random.random(10000, 2.0))
kx.q.mavg(10, kx.random.random(10000, 2.0))

Out[51]:

pykx.FloatVector(pykx.q('1.469756 1.029263 0.7352848 0.5950915 0.7071875 0.8486546 0.910078 0.95322 1...'))

cor¶

Calculate the correlation between two lists

In [52]:

Copied!

kx.q.cor([1, 2, 3], [2, 3, 4])
kx.q.cor([1, 2, 3], [2, 3, 4])

Out[52]:

pykx.FloatAtom(pykx.q('1f'))

In [53]:

Copied!

kx.q.cor(kx.random.random(100, 1.0), kx.random.random(100, 1.0))
kx.q.cor(kx.random.random(100, 1.0), kx.random.random(100, 1.0))

Out[53]:

pykx.FloatAtom(pykx.q('0.02687833'))

prds¶

Calculate the cumulative product across a supplied list

In [54]:

Copied!

kx.q.prds([1, 2, 3, 4, 5])
kx.q.prds([1, 2, 3, 4, 5])

Out[54]:

pykx.LongVector(pykx.q('1 2 6 24 120'))

Iteration functions¶

each¶

Supplied both as a standalone primitive and as a method for PyKX Lambdas each allows you to pass individual elements of a PyKX object to a function

In [55]:

Copied!

kx.q.each(kx.q('{prd x}'), kx.random.random([5, 5], 10.0, seed=10))
kx.q.each(kx.q('{prd x}'), kx.random.random([5, 5], 10.0, seed=10))

Out[55]:

pykx.FloatVector(pykx.q('1033.597 377.1784 7126.713 418.3232 89.97531'))

In [56]:

Copied!

kx.q('{prd x}').each(kx.random.random([5, 5], 10.0, seed=10))
kx.q('{prd x}').each(kx.random.random([5, 5], 10.0, seed=10))

Out[56]:

pykx.FloatVector(pykx.q('1033.597 377.1784 7126.713 418.3232 89.97531'))

Table functions¶

meta¶

Retrieval of metadata information about a table

In [57]:

Copied!





qtab = kx.Table(data = {
    'x' : kx.random.random(1000, ['a', 'b', 'c']).grouped(),
    'y' : kx.random.random(1000, 1.0),
    'z' : kx.random.random(1000, kx.TimestampAtom.inf)
})
qtab = kx.Table(data = {
    'x' : kx.random.random(1000, ['a', 'b', 'c']).grouped(),
    'y' : kx.random.random(1000, 1.0),
    'z' : kx.random.random(1000, kx.TimestampAtom.inf)
})

In [58]:

Copied!

kx.q.meta(qtab)
kx.q.meta(qtab)

Out[58]:

	t	a
c
x	"s"	g
y	"f"
z	"p"

xasc¶

Sort the contents of a specified column in ascending order

In [59]:

Copied!

kx.q.xasc('z', qtab)
kx.q.xasc('z', qtab)

Out[59]:

	x	y	z

0	c	0.2660419	2000.09.17D00:27:33.222932480
1	b	0.2378591	2001.02.01D19:58:48.496586752
2	c	0.05802967	2001.05.29D15:29:16.181340160
3	c	0.9474748	2003.03.24D08:12:02.975653888
4	b	0.02726729	2004.01.31D07:25:21.959215104
5	b	0.08927731	2004.12.31D23:50:54.425055232
6	c	0.2256163	2005.07.12D10:45:38.423119872
7	b	0.1675316	2006.04.19D21:31:40.507750400
...	...	...	...
999	a	0.4414727	2292.03.15D06:41:24.638662656

1,000 rows × 3 columns