Welcome to PyKX!¶
PyKX is a Python library built and maintained for interfacing seamlessly with the worlds fastest time-series database technology kdb+ and it's underlying vector programming language q.
It's aim is to provide you and all Python data-engineers and data-scientist with an interface to efficiently apply analytics on large volumes of on-disk and in-memory data, in a fraction of the time of competitor libraries.
How to import PyKX¶
To access PyKX and it's functions import it in your Python code as follows
import pykx as kx
kx.q.system.console_size = [10, 80]
The shortening of the import name to kx
is done for readability of code that uses PyKX and is the intended standard for the library. As such we recommend that you always use import pykx as kx
when using the library.
Below we load additional libraries used through this notebook.
import numpy as np
import pandas as pd
The basic data structures of PyKX¶
Central to your interaction with PyKX are the various data types that are supported by the library, fundamentally PyKX is built atop a fully featured functional programming language q
which provides small footprint data structures that can be used in analytic calculations and the creation of highly performant databases. The types we show below are generated from Python equivalent types but as you will see through this notebook
In this section we will describe the basic elements which you will come in contact with as you traverse the library and explain why/how they are different.
PyKX Atomic Types¶
In PyKX an atom denotes a single irreducible value of a specific type, for example you may come across pykx.FloatAtom
or pykx.DateAtom
objects generated as follows which may have been generated as follows from an equivalent Pythonic representation.
kx.FloatAtom(1.0)
pykx.FloatAtom(pykx.q('1f'))
from datetime import date
kx.DateAtom(date(2020, 1, 1))
pykx.DateAtom(pykx.q('2020.01.01'))
PyKX Vector Types¶
Similar to atoms, vectors are a data structure composed of a collection of multiple elements of a single specified type. These objects in PyKX along with lists described below form the basis for the majority of the other important data structures that you will encounter including dictionaries and tables.
Typed vector objects provide significant benefits when it comes to the applications of analytics over Python lists for example. Similar to Numpy, PyKX gains from the underlying speed of it's analytic engine when operating on these strictly typed objects.
Vector type objects are always 1-D and as such are/can be indexed along a single axis.
In the following example we are creating PyKX vectors from common Python equivalent numpy
and pandas
objects.
kx.IntVector(np.array([1, 2, 3, 4], dtype=np.int32))
pykx.IntVector(pykx.q('1 2 3 4i'))
kx.toq(pd.Series([1, 2, 3, 4]))
pykx.LongVector(pykx.q('1 2 3 4'))
PyKX Lists¶
A List
in PyKX can loosely be described as an untyped vector object. Unlike vectors which are optimised for the performance of analytics, lists are more commonly used for storing reference information or matrix data.
Unlike vector objects which are by definition 1-D in shape, lists can be ragged N-Dimensional objects. This makes them useful for the storage of some complex data structures but limits their performance when dealing with data-access/data modification tasks.
kx.List([[1, 2, 3], [1.0, 1.1, 1.2], ['a', 'b', 'c']])
pykx.List(pykx.q(' 1 2 3 1 1.1 1.2 a b c '))
PyKX Dictionaries¶
A dictionary in PyKX is defined as a mapping between a direct key-value mapping, the list of keys and values to which they are associated must have the same count. While it can be considered as a key-value pair, it is physically stored as a pair of lists.
print(kx.Dictionary({'x': [1, 2, 3], 'x1': np.array([1, 2, 3])}))
x | 1 2 3 x1| 1 2 3
PyKX Tables¶
Tables in PyKX are a first-class typed entity which live in memory. They can be fundamentally described as a collection of named columns implemented as a dictionary. This mapping construct means that tables in PyKX are column-oriented which makes analytic operations on specified columns much faster than would be the case for a relational database equivalent.
Tables in PyKX come in many forms but the key table types are as follows
pykx.Table
pykx.KeyedTable
pykx.SplayedTable
pykx.PartitionedTable
In this section we will deal only with the first two of these which constitute specifically the in-memory data table types. As will be discussed in later sections Splayed
and Partitioned
tables are memory-mapped on-disk data structures, these are derivations of the pykx.Table
and pykx.KeyedTable
type objects.
pykx.Table
¶
print(kx.Table([[1, 2, 'a'], [2, 3, 'b'], [3, 4, 'c']], columns = ['col1', 'col2', 'col3']))
col1 col2 col3 -------------- 1 2 a 2 3 b 3 4 c
print(kx.Table(data = {'col1': [1, 2, 3], 'col2': [2 , 3, 4], 'col3': ['a', 'b', 'c']}))
col1 col2 col3 -------------- 1 2 a 2 3 b 3 4 c
pykx.KeyedTable
¶
kx.Table(data = {'x': [1, 2, 3], 'x1': [2, 3, 4], 'x2': ['a', 'b', 'c']}).set_index(['x'])
x1 | x2 | |
---|---|---|
x | ||
1 | 2 | a |
2 | 3 | b |
3 | 4 | c |
Other Data Types¶
The above types outline the majority of the important type structures in PyKX but there are many others which you will encounter as you use the library, below we have outlined some of the important ones that you will run into through the rest of this notebook.
pykx.Lambda
¶
A pykx.Lambda
is the most basic kind of function within PyKX. They take between 0 and 8 parameters and are the building blocks for most analytics written by users when interacting with data from PyKX.
pykx_lambda = kx.q('{x+y}')
type(pykx_lambda)
pykx.wrappers.Lambda
pykx_lambda(1, 2)
pykx.LongAtom(pykx.q('3'))
pykx.Projection
¶
Similar to functools.partial, functions in PyKX can have some of their parameters fixed in advance, resulting in a new function, which is called a projection. When this projection is called, the fixed parameters are no longer required, and cannot be provided.
If the original function had n
total parameters, and it had m
provided, the result would be a function (projection) that requires a user to input n-m
parameters.
projection = kx.q('{x+y}')(1)
projection
pykx.Projection(pykx.q('{x+y}[1]'))
projection(2)
pykx.LongAtom(pykx.q('3'))
Accessing and creating PyKX objects¶
Now that we have seen some of the PyKX object types that you will encounter, practically speaking how will they be created in real-world scenarios?
Creating PyKX objects from Pythonic data types¶
One of the most common ways that PyKX data is generated is through conversions from equivalent Pythonic data types. PyKX natively supports conversions to and from the following common Python data formats.
- Python
- Numpy
- Pandas
- PyArrow
In each of the above cases generation of PyKX objects is facilitated through the use of the kx.toq
PyKX function.
pydict = {'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': 2}
kx.toq(pydict)
a | 1 2 3 |
---|---|
b | `a`b`c |
c | 2 |
nparray = np.array([1, 2, 3, 4], dtype = np.int32)
kx.toq(nparray)
pykx.IntVector(pykx.q('1 2 3 4i'))
pdframe = pd.DataFrame(data = {'a':[1, 2, 3], 'b': ['a', 'b', 'c']})
kx.toq(pdframe)
a | b | |
---|---|---|
0 | 1 | a |
1 | 2 | b |
2 | 3 | c |
Random data generation¶
PyKX provides users with a module for the creation of random data of user specified PyKX types or their equivalent Python types. The creation of random data is useful in prototyping analytics and is used extensively within our documentation when creating test examples.
As a first example you can generate a list of 1,000,000 random floating point values between 0 and 1 as follows
kx.random.random(1000000, 1.0)
pykx.FloatVector(pykx.q('0.3927524 0.5170911 0.5159796 0.4066642 0.1780839 0.3017723 0.785033 0.534709..'))
If instead you wish to choose values randomly from a list, this can be facilitated by using the list as the second argument to your function
kx.random.random(5, [kx.LongAtom(1), ['a', 'b', 'c'], np.array([1.1, 1.2, 1.3])])
pykx.List(pykx.q(' 1.1 1.2 1.3 1 1.1 1.2 1.3 1 `a`b`c '))
Random data does not only come in 1-Dimensional forms however and modifications to the first argument to be a list allow you to create multi-Dimensional PyKX Lists. The below examples are additionally using a PyKX trick where nulls/infinities can be used to generate random data across the full allowable range
kx.random.random([2, 5], kx.GUIDAtom.null)
pykx.List(pykx.q(' 9b19ab9c-b26d-d6b3-a8fa-267ba0620848 d8d6c050-964e-6247-e2cd-bf9435389b9a 1c4.. a68f5b00-754e-9863-04aa-8b59cc4e3122 72969cc8-4445-451b-9266-7770a60c3120 0c7.. '))
kx.random.random([2, 3, 4], kx.IntAtom.inf)
pykx.List(pykx.q(' 1837510540 373968399 35818431 1421474592 424239201 1727064393 250148680 1.. 1566069007 1773121422 2104411811 1441846567 103906494 315107819 931560883 .. '))
Finally, users can set the seed for the random data generation explicitly allowing users to have consistency over the generated objects. This can be completed globally or for individual function calls
kx.random.seed(10)
kx.random.random(10, 2.0)
pykx.FloatVector(pykx.q('0.1782082 1.669039 0.7243899 1.999868 0.7675971 1.723838 0.1836728 0.5061767 ..'))
kx.random.random(10, 2.0, seed = 10)
pykx.FloatVector(pykx.q('0.1782082 1.669039 0.7243899 1.999868 0.7675971 1.723838 0.1836728 0.5061767 ..'))
Running q code to generate data¶
As mentioned in the introduction PyKX provides an entrypoint to the vector programming language q, as such users of PyKX can execute q code directly via PyKX within a Python session. This is facilitated through use of calls to kx.q
.
Create some q data:
kx.q('0 1 2 3 4')
pykx.LongVector(pykx.q('0 1 2 3 4'))
kx.q('([idx:desc til 5]col1:til 5;col2:5?1f;col3:5?`2)')
col1 | col2 | col3 | |
---|---|---|---|
idx | |||
4 | 0 | 0.8619188 | ol |
3 | 1 | 0.09183638 | mg |
2 | 2 | 0.2530883 | cm |
1 | 3 | 0.2504566 | cc |
0 | 4 | 0.7517286 | jg |
Apply arguments to a user specified function x+y
kx.q('{x+y}', kx.LongAtom(1), kx.LongAtom(2))
pykx.LongAtom(pykx.q('3'))
Read data from a CSV file¶
A lot of data that you run into for data analysis tasks comes in the form of CSV files, PyKX similar to Pandas provides a CSV reader called via kx.q.read.csv
, in the following cell we will create a CSV to be read in using PyKX
import csv
with open('pykx.csv', 'w', newline='') as file:
writer = csv.writer(file)
field = ["name", "age", "height", "country"]
writer.writerow(field)
writer.writerow(["Oladele Damilola", "40", "180.0", "Nigeria"])
writer.writerow(["Alina Hricko", "23", "179.2", "Ukraine"])
writer.writerow(["Isabel Walter", "50", "179.5", "United Kingdom"])
kx.q.read.csv('pykx.csv', types = {'age': kx.LongAtom, 'country': kx.SymbolAtom})
name | age | height | country | |
---|---|---|---|---|
0 | "Oladele Damilola" | 40 | 180e | Nigeria |
1 | "Alina Hricko" | 23 | 179.2e | Ukraine |
2 | "Isabel Walter" | 50 | 179.5e | United Kingdom |
import os
os.remove('pykx.csv')
Querying external Processes via IPC¶
One of the most common usage patterns in organisations with access to data in kdb+/q you will encounter is to query this data from an external server process infrastructure. In the example below we assume that you have q installed in addition to PyKX, see here to install q alongside the license access for PyKX.
First we set up a q/kdb+ server setting it on port 5050 and populating it with some data in the form of a table tab
import subprocess
import time
try:
proc = subprocess.Popen(
('q', '-p', '5000'),
stdin=subprocess.PIPE,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
time.sleep(2)
except:
raise kx.QError('Unable to create q process on port 5000')
Once a q process is available you can establish a connection to it for synchronous query execution as follows
conn = kx.SyncQConnection(port = 5000)
You can now run q commands against the q server
conn('tab:([]col1:100?`a`b`c;col2:100?1f;col3:100?0Ng)')
conn('select from tab where col1=`a')
col1 | col2 | col3 | |
---|---|---|---|
0 | a | 0.01974141 | ddb87915-b672-2c32-a6cf-296061671e9d |
1 | a | 0.5611439 | 580d8c87-e557-0db1-3a19-cb3a44d623b1 |
2 | a | 0.8685452 | 2d948578-e9d6-79a2-8207-9df7a71f0b3b |
3 | a | 0.3460797 | cddeceef-9ee9-3847-9172-3e3d7ab39b26 |
4 | a | 0.5046331 | 1c22a468-9492-2173-9e4f-9003a23d02b7 |
5 | a | 0.765905 | 5e9cd21b-88c5-bbf5-7215-6409e115a2a4 |
6 | a | 0.8794685 | 3462beab-42ee-ccad-989b-8d69f070dffc |
7 | a | 0.02487862 | bc150163-c551-0eba-8871-9767f5c0e3d5 |
... | ... | ... | ... |
36 | a | 0.9929108 | 03a9b290-95c8-c3b8-fb9a-9ac9874763b8 |
37 rows × 3 columns
Or use the PyKX query API
conn.qsql.select('tab', where=['col1=`a', 'col2<0.3'])
col1 | col2 | col3 | |
---|---|---|---|
0 | a | 0.01974141 | ddb87915-b672-2c32-a6cf-296061671e9d |
1 | a | 0.02487862 | bc150163-c551-0eba-8871-9767f5c0e3d5 |
2 | a | 0.2073435 | ee853957-d502-d30d-5945-bf8c97022332 |
3 | a | 0.2188574 | d9a3e171-b1cf-0271-507a-0fba0b52e6ff |
4 | a | 0.1451855 | ea4d0269-375c-d73b-96f0-6bb6334ca423 |
5 | a | 0.1497004 | 1cce6bdd-e34b-ba4f-8c01-31d098d81221 |
6 | a | 0.166486 | 6417d4b3-3fc6-e35a-1c34-8c5c3327b1e8 |
7 | a | 0.2643322 | f294c3cb-a6da-e15d-c8e0-3a848d2abf10 |
8 | a | 0.07841939 | 020715aa-8ffa-e1d3-9c68-3ad7919d4f5e |
Or use PyKX's context interface to run SQL server side if it's available to you
conn('\l s.k_')
conn.sql('SELECT * FROM tab where col2>=0.5')
col1 | col2 | col3 | |
---|---|---|---|
0 | a | 0.5611439 | 580d8c87-e557-0db1-3a19-cb3a44d623b1 |
1 | a | 0.8685452 | 2d948578-e9d6-79a2-8207-9df7a71f0b3b |
2 | b | 0.7716917 | 52cb20d9-f12c-9963-2829-3c64d8d8cb14 |
3 | a | 0.5046331 | 1c22a468-9492-2173-9e4f-9003a23d02b7 |
4 | c | 0.6014692 | 7ea4d431-4dec-3017-3d13-cc9ef7f1c0ee |
5 | c | 0.5000071 | 782c5346-f5f7-b90e-c686-8d41fa85233b |
6 | c | 0.8392881 | 245f5516-0cb8-391a-e1e5-fadddc8e54ba |
7 | b | 0.5938637 | e30bab29-2df0-3fb0-535f-58d1e7bd83c0 |
... | ... | ... | ... |
55 | b | 0.8236115 | f2c41bca-67df-aa6c-4730-bca38cbd6825 |
56 rows × 3 columns
Finally the q server used for this demonstration can be shut down
proc.stdin.close()
proc.kill()
Running analytics on objects in PyKX¶
Like many Python libraries including Numpy and Pandas PyKX provides a number of ways that it's data can be used with analytics defined internal to the library and which you have self generated.
Using in-built methods on PyKX Vectors¶
When you are interacting with PyKX Vectors you may wish to gain insights into these objects through the application of basic analytics such as calculation of the mean
/median
/mode
of the vector
q_vector = kx.random.random(1000, 10.0)
q_vector.mean()
pykx.FloatAtom(pykx.q('4.984157'))
q_vector.max()
pykx.FloatAtom(pykx.q('9.998212'))
The above is useful for basic analysis but will not be sufficient for more bespoke analytics on these vectors, to allow you more control over the analytics run you can also use the apply
method.
def bespoke_function(x, y):
return x*y
q_vector.apply(bespoke_function, 5)
pykx.FloatVector(pykx.q('31.74132 38.3376 46.40922 10.17963 38.73944 48.33864 41.12562 45.44382 32.290..'))
Using in-built methods on PyKX Tables¶
In addition to the vector processing capabilities of PyKX your ability to operate on Tabular structures is also important. Highlighted in greater depth within the Pandas-Like API documentation here these methods allow you to apply functions and gain insights into your data in a way that is familiar.
In the below example you will use combinations of the most commonly used elements of this Table API operating on the following table
N = 1000000
example_table = kx.Table(data = {
'sym' : kx.random.random(N, ['a', 'b', 'c']),
'col1' : kx.random.random(N, 10.0),
'col2' : kx.random.random(N, 20)
}
)
example_table
sym | col1 | col2 | |
---|---|---|---|
0 | b | 7.782944 | 6 |
1 | c | 0.5899977 | 17 |
2 | c | 2.580528 | 8 |
3 | b | 5.651351 | 10 |
4 | b | 2.336329 | 11 |
5 | b | 2.87167 | 17 |
6 | c | 9.705893 | 9 |
7 | a | 5.729889 | 8 |
... | ... | ... | ... |
999999 | c | 8.862285 | 6 |
1,000,000 rows × 3 columns
You can search for and filter data within your tables using loc
similarly to how this is achieved by Pandas as follows
example_table.loc[example_table['sym'] == 'a']
sym | col1 | col2 | |
---|---|---|---|
0 | a | 5.729889 | 8 |
1 | a | 4.396508 | 13 |
2 | a | 0.7636906 | 19 |
3 | a | 9.904306 | 17 |
4 | a | 1.439738 | 10 |
5 | a | 2.898631 | 19 |
6 | a | 2.360396 | 2 |
7 | a | 1.932728 | 12 |
... | ... | ... | ... |
332823 | a | 6.653308 | 18 |
332,824 rows × 3 columns
This behavior also is incorporated when retrieving data from a table through the __get__
method as you can see here
example_table[example_table['sym'] == 'b']
sym | col1 | col2 | |
---|---|---|---|
0 | b | 7.782944 | 6 |
1 | b | 5.651351 | 10 |
2 | b | 2.336329 | 11 |
3 | b | 2.87167 | 17 |
4 | b | 2.917054 | 2 |
5 | b | 7.093562 | 18 |
6 | b | 1.715391 | 10 |
7 | b | 4.231884 | 0 |
... | ... | ... | ... |
333014 | b | 9.361253 | 17 |
333,015 rows × 3 columns
You can additionally set the index columns of the table, when dealing with PyKX tables this converts the table from a pykx.Table
object to a pykx.KeyedTable
object
example_table.set_index('sym')
col1 | col2 | |
---|---|---|
sym | ||
b | 7.782944 | 6 |
c | 0.5899977 | 17 |
c | 2.580528 | 8 |
b | 5.651351 | 10 |
b | 2.336329 | 11 |
b | 2.87167 | 17 |
c | 9.705893 | 9 |
a | 5.729889 | 8 |
... | ... | ... |
c | 8.862285 | 6 |
1,000,000 rows × 3 columns
Additional to basic data manipulation such as index setting you also get access to analytic capabilities such as the application of basic data manipulation operations such as mean
and median
as demonstrated here
print('mean:')
print(example_table.mean(numeric_only = True))
print('median:')
print(example_table.median(numeric_only = True))
mean: col1| 4.998412 col2| 9.497452 median: col1| 4.996685 col2| 9
You can make use of the groupby
method which groups the PyKX tabular data which can then be used for analytic application.
In your first example let's start by grouping the dataset based on the sym
column and then calculating the mean
for each column based on their sym
example_table.groupby('sym').mean()
col1 | col2 | |
---|---|---|
sym | ||
a | 5.00519 | 9.49375 |
b | 5.000742 | 9.501077 |
c | 4.989338 | 9.497527 |
As an extension to the above groupby you can now consider a more complex example which is making use of numpy
to run some calculations on the PyKX data, you will see later that this can be simplified further in this specific use-case
def apply_func(x):
nparray = x.np()
return np.sqrt(nparray).mean()
example_table.groupby('sym').apply(apply_func)
col1 | col2 | |
---|---|---|
sym | ||
a | 2.109397 | 2.859095 |
b | 2.108571 | 2.860037 |
c | 2.105694 | 2.859527 |
Time-series specific joining of data can be completed using merge_asof
joins. In this example a number of tables with temporal information namely a trades
and quotes
table
trades = kx.Table(data={
"time": [
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.030"),
pd.Timestamp("2016-05-25 13:30:00.041"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.049"),
pd.Timestamp("2016-05-25 13:30:00.072"),
pd.Timestamp("2016-05-25 13:30:00.075")
],
"ticker": [
"GOOG",
"MSFT",
"MSFT",
"MSFT",
"GOOG",
"AAPL",
"GOOG",
"MSFT"
],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
quotes = kx.Table(data={
"time": [
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.038"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
print('trades:')
display(trades)
print('quotes:')
display(quotes)
trades:
time | ticker | bid | ask | |
---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | GOOG | 720.5 | 720.93 |
1 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 51.96 |
2 | 2016.05.25D13:30:00.030000000 | MSFT | 51.97 | 51.98 |
3 | 2016.05.25D13:30:00.041000000 | MSFT | 51.99 | 52f |
4 | 2016.05.25D13:30:00.048000000 | GOOG | 720.5 | 720.93 |
5 | 2016.05.25D13:30:00.049000000 | AAPL | 97.99 | 98.01 |
6 | 2016.05.25D13:30:00.072000000 | GOOG | 720.5 | 720.88 |
7 | 2016.05.25D13:30:00.075000000 | MSFT | 52.01 | 52.03 |
quotes:
time | ticker | price | quantity | |
---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 75 |
1 | 2016.05.25D13:30:00.038000000 | MSFT | 51.95 | 155 |
2 | 2016.05.25D13:30:00.048000000 | GOOG | 720.77 | 100 |
3 | 2016.05.25D13:30:00.048000000 | GOOG | 720.92 | 100 |
4 | 2016.05.25D13:30:00.048000000 | AAPL | 98f | 100 |
When applying the asof join you can additionally used named arguments to ensure that it is possible to make a distinction between the tables that the columns originate. In this case suffixing with _trades
and _quotes
trades.merge_asof(quotes, on='time', suffixes=('_trades', '_quotes'))
time | ticker_trades | bid | ask | ticker_quotes | price | quantity | |
---|---|---|---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | GOOG | 720.5 | 720.93 | MSFT | 51.95 | 75 |
1 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 51.96 | MSFT | 51.95 | 75 |
2 | 2016.05.25D13:30:00.030000000 | MSFT | 51.97 | 51.98 | MSFT | 51.95 | 75 |
3 | 2016.05.25D13:30:00.041000000 | MSFT | 51.99 | 52f | MSFT | 51.95 | 155 |
4 | 2016.05.25D13:30:00.048000000 | GOOG | 720.5 | 720.93 | AAPL | 98f | 100 |
5 | 2016.05.25D13:30:00.049000000 | AAPL | 97.99 | 98.01 | AAPL | 98f | 100 |
6 | 2016.05.25D13:30:00.072000000 | GOOG | 720.5 | 720.88 | AAPL | 98f | 100 |
7 | 2016.05.25D13:30:00.075000000 | MSFT | 52.01 | 52.03 | AAPL | 98f | 100 |
Using PyKX/q native functions¶
While use of the Pandas-Like API and methods provided off PyKX Vectors provides an effective method of applying analytics on PyKX data the most efficient and performant way you can run analytics on your data is through the use of the PyKX/q primitives which are available through the kx.q
module.
These include functionality for the calculation of moving averages, application of asof/window joins, column reversal etc. A full list of the available functions and some examples of their usage can be found here.
Here are a few examples of usage of how you can use these functions, broken into sections for convenience
Mathematical functions¶
mavg¶
Calculate a series of average values across a list using a rolling window
kx.q.mavg(10, kx.random.random(10000, 2.0))
pykx.FloatVector(pykx.q('1.469756 1.029263 0.7352848 0.5950915 0.7071875 0.8486546 0.910078 0.95322 1...'))
cor¶
Calculate the correlation between two lists
kx.q.cor([1, 2, 3], [2, 3, 4])
pykx.FloatAtom(pykx.q('1f'))
kx.q.cor(kx.random.random(100, 1.0), kx.random.random(100, 1.0))
pykx.FloatAtom(pykx.q('0.02687833'))
prds¶
Calculate the cumulative product across a supplied list
kx.q.prds([1, 2, 3, 4, 5])
pykx.LongVector(pykx.q('1 2 6 24 120'))
kx.q.each(kx.q('{prd x}'), kx.random.random([5, 5], 10.0, seed=10))
pykx.FloatVector(pykx.q('1033.597 377.1784 7126.713 418.3232 89.97531'))
kx.q('{prd x}').each(kx.random.random([5, 5], 10.0, seed=10))
pykx.FloatVector(pykx.q('1033.597 377.1784 7126.713 418.3232 89.97531'))
qtab = kx.Table(data = {
'x' : kx.random.random(1000, ['a', 'b', 'c']).grouped(),
'y' : kx.random.random(1000, 1.0),
'z' : kx.random.random(1000, kx.TimestampAtom.inf)
})
kx.q.meta(qtab)
t | f | a | |
---|---|---|---|
c | |||
x | "s" | g | |
y | "f" | ||
z | "p" |
xasc¶
Sort the contents of a specified column in ascending order
kx.q.xasc('z', qtab)
x | y | z | |
---|---|---|---|
0 | c | 0.2660419 | 2000.09.17D00:27:33.222932480 |
1 | b | 0.2378591 | 2001.02.01D19:58:48.496586752 |
2 | c | 0.05802967 | 2001.05.29D15:29:16.181340160 |
3 | c | 0.9474748 | 2003.03.24D08:12:02.975653888 |
4 | b | 0.02726729 | 2004.01.31D07:25:21.959215104 |
5 | b | 0.08927731 | 2004.12.31D23:50:54.425055232 |
6 | c | 0.2256163 | 2005.07.12D10:45:38.423119872 |
7 | b | 0.1675316 | 2006.04.19D21:31:40.507750400 |
... | ... | ... | ... |
999 | a | 0.4414727 | 2292.03.15D06:41:24.638662656 |
1,000 rows × 3 columns