PyKX introduction notebook¶
The purpose of this notebook is to introduce you to PyKX capabilities and functionality.
For the best experience, visit what is PyKX and the quickstart guide first.
To follow along, we recommend to download the notebook.
Now let's go through the following sections:
1. Import PyKX¶
To access PyKX and its functions, import it in your Python code as follows:
import pykx as kx
kx.q.system.console_size = [10, 80]
Tip: We recommend to always use import pykx as kx
. The shortened import name kx
makes the code more readabile and is standard for the PyKX library.
Below we load additional libraries used through this notebook:
import numpy as np
import pandas as pd
2. Basic PyKX data structures¶
Central to your interaction with PyKX are the data types supported by the library. PyKX is built atop the q
programming language. This provides small footprint data structures for analytic calculations and the creation of highly-performant databases. The types we show below are generated from Python-equivalent types.
This section describes the basic elements in the PyKX library and explains why/how they are different:
- 2.1 Atom
- 2.2 Vector
- 2.3 List
- 2.4 Dictionary
- 2.5 Table
- 2.6 Other data types
2.1 Atom¶
In PyKX, an atom
is a single irreducible value of a specific type. For example, you may come across pykx.FloatAtom
or pykx.DateAtom
objects which may have been generated as follows, from an equivalent Pythonic representation.
kx.FloatAtom(1.0)
pykx.FloatAtom(pykx.q('1f'))
from datetime import date
kx.DateAtom(date(2020, 1, 1))
pykx.DateAtom(pykx.q('2020.01.01'))
2.2 Vector¶
Like PyKX atoms, PyKX Vectors
are a data structure with multiple elements of a single type. These objects in PyKX, along with lists described below, form the basis for most of the other important data structures that you will encounter including dictionaries and tables.
Vector objects provide significant benefits when applying analytics over Python lists. Like Numpy, PyKX gains from the underlying speed of its analytic engine when operating on these strictly typed objects.
Vector type objects are always 1-D and are/can be indexed along a single axis.
In the following example, we create PyKX vectors from common Python equivalent numpy
and pandas
objects:
kx.IntVector(np.array([1, 2, 3, 4], dtype=np.int32))
pykx.IntVector(pykx.q('1 2 3 4i'))
kx.toq(pd.Series([1, 2, 3, 4]))
pykx.LongVector(pykx.q('1 2 3 4'))
2.3 List¶
A PyKX List
is an untyped vector object. Unlike vectors which are optimised for the performance of analytics, lists are mostly used for storing reference information or matrix data.
Unlike vector objects which are 1-D in shape, lists can be ragged N-Dimensional objects. This makes them useful for storing complex data structures, but limits their performance when dealing with data-access/data modification tasks.
kx.List([[1, 2, 3], [1.0, 1.1, 1.2], ['a', 'b', 'c']])
pykx.List(pykx.q(' 1 2 3 1 1.1 1.2 a b c '))
2.4 Dictionary¶
A PyKX Dictionary
is a mapping between a direct key-value association. The list of keys and values to which they are associated must have the same count. While it can be considered as a key-value pair, it's physically stored as a pair of lists.
kx.Dictionary({'x': [1, 2, 3], 'x1': np.array([1, 2, 3])})
x | 1 2 3 |
---|---|
x1 | 1 2 3 |
2.5 Table¶
PyKX Tables
are a first-class typed entity which lives in memory. They're a collection of named columns implemented as a dictionary. This mapping construct means that PyKX tables are column oriented. This makes analytic operations on columns much faster than for a relational database equivalent.
PyKX Tables come in many forms, but the key table types are as follows:
pykx.Table
pykx.KeyedTable
pykx.SplayedTable
pykx.PartitionedTable
In this section we exemplify the first two, which are the in-memory data table types.
pykx.Table¶
print(kx.Table([[1, 2, 'a'], [2, 3, 'b'], [3, 4, 'c']], columns = ['col1', 'col2', 'col3']))
col1 col2 col3 -------------- 1 2 a 2 3 b 3 4 c
print(kx.Table(data = {'col1': [1, 2, 3], 'col2': [2 , 3, 4], 'col3': ['a', 'b', 'c']}))
col1 col2 col3 -------------- 1 2 a 2 3 b 3 4 c
pykx.KeyedTable¶
kx.Table([[1, 2, 'a'], [2, 3, 'b'], [3, 4, 'c']],
columns = ['col1', 'col2', 'col3'])
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 2 | a |
1 | 2 | 3 | b |
2 | 3 | 4 | c |
kx.Table(data = {
'col1': [1, 2, 3],
'col2': [2 , 3, 4],
'col3': ['a', 'b', 'c']})
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 2 | a |
1 | 2 | 3 | b |
2 | 3 | 4 | c |
pykx.KeyedTable
¶
kx.Table(data = {'x': [1, 2, 3], 'x1': [2, 3, 4], 'x2': ['a', 'b', 'c']}
).set_index(['x'])
x1 | x2 | |
---|---|---|
x | ||
1 | 2 | a |
2 | 3 | b |
3 | 4 | c |
2.6 Other data types¶
Below we outlined some of the important PyKX data type structures that you will run into through the rest of this notebook.
pykx.Lambda¶
A pykx.Lambda
is the most basic kind of function within PyKX. They take between 0 and 8 parameters and are the building blocks for most analytics written by users when interacting with data from PyKX.
pykx_lambda = kx.q('{x+y}')
type(pykx_lambda)
pykx.wrappers.Lambda
pykx_lambda(1, 2)
pykx.LongAtom(pykx.q('3'))
pykx.Projection¶
Like functools.partial, functions in PyKX can have some of their parameters set in advance, resulting in a new function, which is called a projection. When you call this projection, the set parameters are no longer required and cannot be provided.
If the original function had n
total parameters and m
provided, the result would be a function (projection) that requires the user to input n-m
parameters.
projection = kx.q('{x+y}')(1)
projection
pykx.Projection(pykx.q('{x+y}[1]'))
projection(2)
pykx.LongAtom(pykx.q('3'))
3. Access and create PyKX objects¶
Now that you're familiar with the PyKX object types, let's see how they work in real-world scenarios, such as:
- 3.1 Create PyKX objects from Pythonic data types
- 3.2 Random data generation
- 3.3 Run q code to generate data
- 3.4 Read data from a CSV file
- 3.5 Query external processes via IPC
3.1 Create PyKX objects from Pythonic data types¶
One of the most common ways to generate PyKX data is by converting equivalent Pythonic data types. PyKX natively supports conversions to and from the following common Python data formats:
- Python
- Numpy
- Pandas
- PyArrow
You can generate PyKX objects by using the kx.toq
PyKX function:
pydict = {'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': 2}
kx.toq(pydict)
a | 1 2 3 |
---|---|
b | `a`b`c |
c | 2 |
nparray = np.array([1, 2, 3, 4], dtype = np.int32)
kx.toq(nparray)
pykx.IntVector(pykx.q('1 2 3 4i'))
pdframe = pd.DataFrame(data = {'a':[1, 2, 3], 'b': ['a', 'b', 'c']})
kx.toq(pdframe)
a | b | |
---|---|---|
0 | 1 | a |
1 | 2 | b |
2 | 3 | c |
3.2 Random data generation¶
PyKX provides a module to create random data of user-specified PyKX types or their equivalent Python types. The creation of random data helps in prototyping analytics.
As a first example, generate a list of 1,000,000 random floating-point values between 0 and 1 as follows:
kx.random.random(1000000, 1.0)
pykx.FloatVector(pykx.q('0.3927524 0.5170911 0.5159796 0.4066642 0.1780839 0.3017723 0.785033 0.534709..'))
If you wish to choose values randomly from a list, use the list as the second argument to your function:
kx.random.random(5, [kx.LongAtom(1), ['a', 'b', 'c'], np.array([1.1, 1.2, 1.3])])
pykx.List(pykx.q(' 1.1 1.2 1.3 1 1.1 1.2 1.3 1 `a`b`c '))
Random data does not only come in 1-Dimensional forms. To create multi-Dimensional PyKX Lists, turn the first argument into a list. The following examples include a PyKX trick that uses nulls/infinities to generate random data across the full allowable range:
kx.random.random([2, 5], kx.GUIDAtom.null)
pykx.List(pykx.q(' 9b19ab9c-b26d-d6b3-a8fa-267ba0620848 d8d6c050-964e-6247-e2cd-bf9435389b9a 1c4.. a68f5b00-754e-9863-04aa-8b59cc4e3122 72969cc8-4445-451b-9266-7770a60c3120 0c7.. '))
kx.random.random([2, 3, 4], kx.IntAtom.inf)
pykx.List(pykx.q(' 1837510540 373968399 35818431 1421474592 424239201 1727064393 250148680 1.. 1566069007 1773121422 2104411811 1441846567 103906494 315107819 931560883 .. '))
Finally, to have consistency over the generated objects, set the seed for the random data generation explicitly. You can complete this globally or for individual function calls:
kx.random.seed(10)
kx.random.random(10, 2.0)
pykx.FloatVector(pykx.q('0.1782082 1.669039 0.7243899 1.999868 0.7675971 1.723838 0.1836728 0.5061767 ..'))
kx.random.random(10, 2.0, seed = 10)
pykx.FloatVector(pykx.q('0.1782082 1.669039 0.7243899 1.999868 0.7675971 1.723838 0.1836728 0.5061767 ..'))
3.3 Run q code to generate data¶
PyKX is an entry point to the vector programming language q. This means that PyKX users can execute q code directly via PyKX within a Python session, by calling kx.q
.
For example, to create q data, run the following command:
kx.q('0 1 2 3 4')
pykx.LongVector(pykx.q('0 1 2 3 4'))
kx.q('([idx:desc til 5]col1:til 5;col2:5?1f;col3:5?`2)')
col1 | col2 | col3 | |
---|---|---|---|
idx | |||
4 | 0 | 0.8619188 | ol |
3 | 1 | 0.09183638 | mg |
2 | 2 | 0.2530883 | cm |
1 | 3 | 0.2504566 | cc |
0 | 4 | 0.7517286 | jg |
Next, apply arguments to a user-specified function x+y
:
kx.q('{x+y}', kx.LongAtom(1), kx.LongAtom(2))
pykx.LongAtom(pykx.q('3'))
3.4 Read data from a CSV file¶
A lot of data that you run into for data analysis tasks comes in the form of CSV files. PyKX, like Pandas, provides a CSV reader called via kx.q.read.csv
. In the next cell we create a CSV that can be read in PyKX:
import csv
with open('pykx.csv', 'w', newline='') as file:
writer = csv.writer(file)
field = ["name", "age", "height", "country"]
writer.writerow(field)
writer.writerow(["Oladele Damilola", "40", "180.0", "Nigeria"])
writer.writerow(["Alina Hricko", "23", "179.2", "Ukraine"])
writer.writerow(["Isabel Walter", "50", "179.5", "United Kingdom"])
kx.q.read.csv('pykx.csv', types = {'age': kx.LongAtom, 'country': kx.SymbolAtom})
name | age | height | country | |
---|---|---|---|---|
0 | "Oladele Damilola" | 40 | 180e | Nigeria |
1 | "Alina Hricko" | 23 | 179.2e | Ukraine |
2 | "Isabel Walter" | 50 | 179.5e | United Kingdom |
import os
os.remove('pykx.csv')
3.5 Query external processes via IPC¶
One of the most common usage patterns in organizations with access to data in kdb+/q is to query data from an external server process infrastructure. For the example below you need to install q.
First, set up a q/kdb+ server. Set it on port 5050 and populate it with some data in the form of a table tab
:
import subprocess
import time
try:
with kx.PyKXReimport():
proc = subprocess.Popen(
('q', '-p', '5000')
)
time.sleep(2)
except:
raise kx.QError('Unable to create q process on port 5000')
Once a q process is available, connect to it for synchronous query execution:
conn = kx.SyncQConnection(port = 5000)
You can now run q commands against the q server:
conn('tab:([]col1:100?`a`b`c;col2:100?1f;col3:100?0Ng)')
conn('select from tab where col1=`a')
col1 | col2 | col3 | |
---|---|---|---|
0 | a | 0.01974141 | ddb87915-b672-2c32-a6cf-296061671e9d |
1 | a | 0.5611439 | 580d8c87-e557-0db1-3a19-cb3a44d623b1 |
2 | a | 0.8685452 | 2d948578-e9d6-79a2-8207-9df7a71f0b3b |
3 | a | 0.3460797 | cddeceef-9ee9-3847-9172-3e3d7ab39b26 |
4 | a | 0.5046331 | 1c22a468-9492-2173-9e4f-9003a23d02b7 |
5 | a | 0.765905 | 5e9cd21b-88c5-bbf5-7215-6409e115a2a4 |
6 | a | 0.8794685 | 3462beab-42ee-ccad-989b-8d69f070dffc |
7 | a | 0.02487862 | bc150163-c551-0eba-8871-9767f5c0e3d5 |
8 | a | 0.3664924 | dd6b4a2b-c046-e464-a0b9-efb96ed5f0eb |
... | ... | ... | ... |
36 | a | 0.9929108 | 03a9b290-95c8-c3b8-fb9a-9ac9874763b8 |
37 rows × 3 columns
Alternatively, use the PyKX query API:
conn.qsql.select('tab', where=['col1=`a', 'col2<0.3'])
col1 | col2 | col3 | |
---|---|---|---|
0 | a | 0.01974141 | ddb87915-b672-2c32-a6cf-296061671e9d |
1 | a | 0.02487862 | bc150163-c551-0eba-8871-9767f5c0e3d5 |
2 | a | 0.2073435 | ee853957-d502-d30d-5945-bf8c97022332 |
3 | a | 0.2188574 | d9a3e171-b1cf-0271-507a-0fba0b52e6ff |
4 | a | 0.1451855 | ea4d0269-375c-d73b-96f0-6bb6334ca423 |
5 | a | 0.1497004 | 1cce6bdd-e34b-ba4f-8c01-31d098d81221 |
6 | a | 0.166486 | 6417d4b3-3fc6-e35a-1c34-8c5c3327b1e8 |
7 | a | 0.2643322 | f294c3cb-a6da-e15d-c8e0-3a848d2abf10 |
8 | a | 0.07841939 | 020715aa-8ffa-e1d3-9c68-3ad7919d4f5e |
9 | a | 0.08077328 | 65b2f5b0-918c-b87b-4fc4-4aa24b192476 |
Or use PyKX's context interface to run SQL server side if you have access to it:
conn('\l s.k_')
conn.sql('SELECT * FROM tab where col2>=0.5')
col1 | col2 | col3 | |
---|---|---|---|
0 | a | 0.5611439 | 580d8c87-e557-0db1-3a19-cb3a44d623b1 |
1 | a | 0.8685452 | 2d948578-e9d6-79a2-8207-9df7a71f0b3b |
2 | b | 0.7716917 | 52cb20d9-f12c-9963-2829-3c64d8d8cb14 |
3 | a | 0.5046331 | 1c22a468-9492-2173-9e4f-9003a23d02b7 |
4 | c | 0.6014692 | 7ea4d431-4dec-3017-3d13-cc9ef7f1c0ee |
5 | c | 0.5000071 | 782c5346-f5f7-b90e-c686-8d41fa85233b |
6 | c | 0.8392881 | 245f5516-0cb8-391a-e1e5-fadddc8e54ba |
7 | b | 0.5938637 | e30bab29-2df0-3fb0-535f-58d1e7bd83c0 |
8 | a | 0.765905 | 5e9cd21b-88c5-bbf5-7215-6409e115a2a4 |
... | ... | ... | ... |
55 | b | 0.8236115 | f2c41bca-67df-aa6c-4730-bca38cbd6825 |
56 rows × 3 columns
Finally, shut down the q server used for this demonstration:
proc.kill()
4. Run analytics on PyKX objects¶
Like many Python libraries (including Numpy and Pandas), PyKX provides many ways to use its data with analytics that you generated and defined within the library. Let's explore the following:
- 4.1 Use in-built methods on PyKX Vectors
- 4.2 Use in-built methods on PyKX Tables
- 4.3 Use PyKX/q native functions
4.1 Use in-built methods on PyKX Vectors¶
When you interact with PyKX Vectors, you may wish to gain insights into these objects through the application of basic analytics such as calculation of the mean
/median
/mode
of the vector:
q_vector = kx.random.random(1000, 10.0)
q_vector.mean()
pykx.FloatAtom(pykx.q('4.984157'))
q_vector.max()
pykx.FloatAtom(pykx.q('9.998212'))
The above is useful for basic analysis. For bespoke analytics on these vectors, use the apply
method:
def bespoke_function(x, y):
return x*y
q_vector.apply(bespoke_function, 5)
pykx.FloatVector(pykx.q('31.74132 38.3376 46.40922 10.17963 38.73944 48.33864 41.12562 45.44382 32.290..'))
4.2 Use in-built methods on PyKX Tables¶
In addition to the vector processing capabilities of PyKX, it's important to have the ability to manage tables. Highlighted in depth within the Pandas-Like API documentation here, these methods allow you to apply functions and gain insights into your data in a familiar way.
The example below uses combinations of the most used elements of this Table API operating on the following table:
N = 1000000
example_table = kx.Table(data = {
'sym' : kx.random.random(N, ['a', 'b', 'c']),
'col1' : kx.random.random(N, 10.0),
'col2' : kx.random.random(N, 20)
}
)
example_table
sym | col1 | col2 | |
---|---|---|---|
0 | b | 7.782944 | 6 |
1 | c | 0.5899977 | 17 |
2 | c | 2.580528 | 8 |
3 | b | 5.651351 | 10 |
4 | b | 2.336329 | 11 |
5 | b | 2.87167 | 17 |
6 | c | 9.705893 | 9 |
7 | a | 5.729889 | 8 |
8 | c | 1.482026 | 14 |
... | ... | ... | ... |
999999 | c | 8.862285 | 6 |
1,000,000 rows × 3 columns
You can search for and filter data within your tables using loc
similarly to how this is achieved by Pandas:
example_table.loc[example_table['sym'] == 'a']
sym | col1 | col2 | |
---|---|---|---|
0 | a | 5.729889 | 8 |
1 | a | 4.396508 | 13 |
2 | a | 0.7636906 | 19 |
3 | a | 9.904306 | 17 |
4 | a | 1.439738 | 10 |
5 | a | 2.898631 | 19 |
6 | a | 2.360396 | 2 |
7 | a | 1.932728 | 12 |
8 | a | 4.877998 | 4 |
... | ... | ... | ... |
332823 | a | 6.653308 | 18 |
332,824 rows × 3 columns
This also happens when retrieving data from a table through the __get__
method:
example_table[example_table['sym'] == 'b']
sym | col1 | col2 | |
---|---|---|---|
0 | b | 7.782944 | 6 |
1 | b | 5.651351 | 10 |
2 | b | 2.336329 | 11 |
3 | b | 2.87167 | 17 |
4 | b | 2.917054 | 2 |
5 | b | 7.093562 | 18 |
6 | b | 1.715391 | 10 |
7 | b | 4.231884 | 0 |
8 | b | 4.727296 | 2 |
... | ... | ... | ... |
333014 | b | 9.361253 | 17 |
333,015 rows × 3 columns
Next, you can set the index columns of a table. In PyKX, this means converting the table from a pykx.Table
object to a pykx.KeyedTable
object:
example_table.set_index('sym')
col1 | col2 | |
---|---|---|
sym | ||
b | 7.782944 | 6 |
c | 0.5899977 | 17 |
c | 2.580528 | 8 |
b | 5.651351 | 10 |
b | 2.336329 | 11 |
b | 2.87167 | 17 |
c | 9.705893 | 9 |
a | 5.729889 | 8 |
c | 1.482026 | 14 |
... | ... | ... |
c | 8.862285 | 6 |
1,000,000 rows × 3 columns
Or you can apply basic data manipulation operations such as mean
and median
:
print('mean:')
display(example_table.mean(numeric_only = True))
print('median:')
display(example_table.median(numeric_only = True))
mean:
col1 | 4.998412 |
---|---|
col2 | 9.497452 |
median:
col1 | 4.996685 |
---|---|
col2 | 9f |
Next, use the groupby
method to group PyKX tabular data so you can use it for analytic purposes.
In the first example, let's start by grouping the dataset based on the sym
column and calculate the mean
for each column based on their sym
:
example_table.groupby('sym').mean()
col1 | col2 | |
---|---|---|
sym | ||
a | 5.00519 | 9.49375 |
b | 5.000742 | 9.501077 |
c | 4.989338 | 9.497527 |
To extend the above groupby
, consider a more complex example which uses numpy
to run calculations on the PyKX data. You will notice later that you can simplify this specific use-case further.
def apply_func(x):
nparray = x.np()
return np.sqrt(nparray).mean()
example_table.groupby('sym').apply(apply_func)
col1 | col2 | |
---|---|---|
sym | ||
a | 2.109397 | 2.859095 |
b | 2.108571 | 2.860037 |
c | 2.105694 | 2.859527 |
For time-series specific joining of data, use merge_asof
joins. In this example, you have several tables with temporal information namely a trades
and quotes
table:
trades = kx.Table(data={
"time": [
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.030"),
pd.Timestamp("2016-05-25 13:30:00.041"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.049"),
pd.Timestamp("2016-05-25 13:30:00.072"),
pd.Timestamp("2016-05-25 13:30:00.075")
],
"ticker": [
"GOOG",
"MSFT",
"MSFT",
"MSFT",
"GOOG",
"AAPL",
"GOOG",
"MSFT"
],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
quotes = kx.Table(data={
"time": [
pd.Timestamp("2016-05-25 13:30:00.023"),
pd.Timestamp("2016-05-25 13:30:00.038"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.048"),
pd.Timestamp("2016-05-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
print('trades:')
display(trades)
print('quotes:')
display(quotes)
trades:
time | ticker | bid | ask | |
---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | GOOG | 720.5 | 720.93 |
1 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 51.96 |
2 | 2016.05.25D13:30:00.030000000 | MSFT | 51.97 | 51.98 |
3 | 2016.05.25D13:30:00.041000000 | MSFT | 51.99 | 52f |
4 | 2016.05.25D13:30:00.048000000 | GOOG | 720.5 | 720.93 |
5 | 2016.05.25D13:30:00.049000000 | AAPL | 97.99 | 98.01 |
6 | 2016.05.25D13:30:00.072000000 | GOOG | 720.5 | 720.88 |
7 | 2016.05.25D13:30:00.075000000 | MSFT | 52.01 | 52.03 |
quotes:
time | ticker | price | quantity | |
---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 75 |
1 | 2016.05.25D13:30:00.038000000 | MSFT | 51.95 | 155 |
2 | 2016.05.25D13:30:00.048000000 | GOOG | 720.77 | 100 |
3 | 2016.05.25D13:30:00.048000000 | GOOG | 720.92 | 100 |
4 | 2016.05.25D13:30:00.048000000 | AAPL | 98f | 100 |
When applying the asof
join, you can additionally use named arguments to make a distinction between the tables that the columns originate from. In this case, suffix with _trades
and _quotes
:
trades.merge_asof(quotes, on='time', suffixes=('_trades', '_quotes'))
time | ticker_trades | bid | ask | ticker_quotes | price | quantity | |
---|---|---|---|---|---|---|---|
0 | 2016.05.25D13:30:00.023000000 | GOOG | 720.5 | 720.93 | MSFT | 51.95 | 75 |
1 | 2016.05.25D13:30:00.023000000 | MSFT | 51.95 | 51.96 | MSFT | 51.95 | 75 |
2 | 2016.05.25D13:30:00.030000000 | MSFT | 51.97 | 51.98 | MSFT | 51.95 | 75 |
3 | 2016.05.25D13:30:00.041000000 | MSFT | 51.99 | 52f | MSFT | 51.95 | 155 |
4 | 2016.05.25D13:30:00.048000000 | GOOG | 720.5 | 720.93 | AAPL | 98f | 100 |
5 | 2016.05.25D13:30:00.049000000 | AAPL | 97.99 | 98.01 | AAPL | 98f | 100 |
6 | 2016.05.25D13:30:00.072000000 | GOOG | 720.5 | 720.88 | AAPL | 98f | 100 |
7 | 2016.05.25D13:30:00.075000000 | MSFT | 52.01 | 52.03 | AAPL | 98f | 100 |
4.3 Use PyKX/q native functions¶
While the Pandas-like API and methods provided off PyKX Vectors provides an effective method of applying analytics on PyKX data, the most efficient and performant way to run analytics on your data is by using PyKX/q primitives available through the kx.q
module.
These include functionality for calculating moving averages, asof/window joins, column reversal etc. Now let's see a few examples with how you can use these functions, grouped into the following sections:
- 4.3.1 Mathematical functions
- 4.3.2 Iteration functions
- 4.3.3 Table functions
4.3.1 Mathematical functions¶
mavg¶
Calculate a series of average values across a list using a rolling window:
kx.q.mavg(10, kx.random.random(10000, 2.0))
pykx.FloatVector(pykx.q('1.469756 1.029263 0.7352848 0.5950915 0.7071875 0.8486546 0.910078 0.95322 1...'))
cor¶
Calculate the correlation between two lists:
kx.q.cor([1, 2, 3], [2, 3, 4])
pykx.FloatAtom(pykx.q('1f'))
kx.q.cor(kx.random.random(100, 1.0), kx.random.random(100, 1.0))
pykx.FloatAtom(pykx.q('0.02687833'))
prds¶
Calculate the cumulative product across a supplied list:
kx.q.prds([1, 2, 3, 4, 5])
pykx.LongVector(pykx.q('1 2 6 24 120'))
kx.q.each(kx.q('{prd x}'), kx.random.random([5, 5], 10.0, seed=10))
pykx.FloatVector(pykx.q('1033.597 377.1784 7126.713 418.3232 89.97531'))
kx.q('{prd x}').each(kx.random.random([5, 5], 10.0, seed=10))
pykx.FloatVector(pykx.q('1033.597 377.1784 7126.713 418.3232 89.97531'))
qtab = kx.Table(data = {
'x' : kx.random.random(1000, ['a', 'b', 'c']).grouped(),
'y' : kx.random.random(1000, 1.0),
'z' : kx.random.random(1000, kx.TimestampAtom.inf)
})
kx.q.meta(qtab)
t | f | a | |
---|---|---|---|
c | |||
x | "s" | g | |
y | "f" | ||
z | "p" |
xasc¶
Sort the contents of a specified column in ascending order:
kx.q.xasc('z', qtab)
x | y | z | |
---|---|---|---|
0 | c | 0.2660419 | 2000.09.17D00:27:33.222932480 |
1 | b | 0.2378591 | 2001.02.01D19:58:48.496586752 |
2 | c | 0.05802967 | 2001.05.29D15:29:16.181340160 |
3 | c | 0.9474748 | 2003.03.24D08:12:02.975653888 |
4 | b | 0.02726729 | 2004.01.31D07:25:21.959215104 |
5 | b | 0.08927731 | 2004.12.31D23:50:54.425055232 |
6 | c | 0.2256163 | 2005.07.12D10:45:38.423119872 |
7 | b | 0.1675316 | 2006.04.19D21:31:40.507750400 |
8 | b | 0.8185412 | 2006.05.28D15:22:24.331161600 |
... | ... | ... | ... |
999 | a | 0.4414727 | 2292.03.15D06:41:24.638662656 |
1,000 rows × 3 columns
You can find the full list of the functions and some examples of their usage here.