# PyKX introduction notebook

_The purpose of this notebook is to introduce you to PyKX capabilities and functionality._

For the best experience, visit [what is PyKX](../getting-started/what_is_pykx.html) and the [quickstart guide](../getting-started/quickstart.html) first.

To follow along, we recommend to <a href="./interface-overview.ipynb" download> download the notebook. </a>

Now let's go through the following sections:

1. [Import PyKX](#1-import-pykx)
1. [Basic PyKX data structures](#2-basic-pykx-data-structures)
1. [Access and create PyKX objects](#3-access-and-create-pykx-objects)
1. [Run analytics on PyKX objects](#4-run-analytics-on-pykx-objects)

## 1. Import PyKX

To access PyKX and its functions, import it in your Python code as follows:

In [None]:
import os
os.environ['PYKX_IGNORE_QHOME'] = '1' # Ignore symlinking PyKX q libraries to QHOME
os.environ['PYKX_Q_LOADED_MARKER'] = '' # Only used here for running Notebook under mkdocs-jupyter during document generation.

In [None]:
import pykx as kx
kx.q.system.console_size = [10, 80]

Tip: We recommend to always use `import pykx as kx`. The shortened import name `kx` makes the code more readabile and is standard for the PyKX library. 

Below we load additional libraries used through this notebook:

In [None]:
import numpy as np
import pandas as pd

## 2. Basic PyKX data structures

Central to your interaction with PyKX are the data types supported by the library. PyKX is built atop the `q` programming language. This provides small footprint data structures for analytic calculations and the creation of highly-performant databases. The types we show below are generated from Python-equivalent types.

This section describes the basic elements in the PyKX library and explains why/how they are different:

- 2.1 [Atom](#21-atom)
- 2.2 [Vector](#22-vector)
- 2.3 [List](#23-list)
- 2.4 [Dictionary](#24-dictionary)
- 2.5 [Table](#25-table)
- 2.6 [Other data types](#26-other-data-types)


### 2.1 Atom

In PyKX, an `atom` is a single irreducible value of a specific type. For example, you may come across `pykx.FloatAtom` or `pykx.DateAtom` objects which may have been generated as follows, from an equivalent Pythonic representation. 

In [None]:
kx.FloatAtom(1.0)

In [None]:
from datetime import date
kx.DateAtom(date(2020, 1, 1))

### 2.2 Vector

Like PyKX atoms, PyKX `Vectors` are a data structure with multiple elements of a single type. These objects in PyKX, along with lists described below, form the basis for most of the other important data structures that you will encounter including dictionaries and tables.

Vector objects provide significant benefits when applying analytics over Python lists. Like Numpy, PyKX gains from the underlying speed of its analytic engine when operating on these strictly typed objects.

Vector type objects are always 1-D and are/can be indexed along a single axis.

In the following example, we create PyKX vectors from common Python equivalent `numpy` and `pandas` objects:

In [None]:
kx.IntVector(np.array([1, 2, 3, 4], dtype=np.int32))

In [None]:
kx.toq(pd.Series([1, 2, 3, 4]))

### 2.3 List

A PyKX `List` is an untyped vector object. Unlike vectors which are optimised for the performance of analytics, lists are mostly used for storing reference information or matrix data.

Unlike vector objects which are 1-D in shape, lists can be ragged N-Dimensional objects. This makes them useful for storing complex data structures, but limits their performance when dealing with data-access/data modification tasks.

In [None]:
kx.List([[1, 2, 3], [1.0, 1.1, 1.2], ['a', 'b', 'c']])

### 2.4 Dictionary

A PyKX `Dictionary` is a mapping between a direct key-value association. The list of keys and values to which they are associated must have the same count. While it can be considered as a key-value pair, it's physically stored as a pair of lists.

In [None]:
kx.Dictionary({'x': [1, 2, 3], 'x1': np.array([1, 2, 3])})

### 2.5 Table

PyKX `Tables` are a first-class typed entity which lives in memory. They're a collection of named columns implemented as a dictionary. This mapping construct means that PyKX tables are column oriented. This makes analytic operations on columns much faster than for a relational database equivalent.

PyKX Tables come in many forms, but the key table types are as follows:

 - `pykx.Table` 
 - `pykx.KeyedTable`
 - `pykx.SplayedTable`
 - `pykx.PartitionedTable`

In this section we exemplify the first two, which are the in-memory data table types.

#### pykx.Table

In [None]:
print(kx.Table([[1, 2, 'a'], [2, 3, 'b'], [3, 4, 'c']], columns = ['col1', 'col2', 'col3']))

In [None]:
print(kx.Table(data = {'col1': [1, 2, 3], 'col2': [2 , 3, 4], 'col3': ['a', 'b', 'c']}))

#### pykx.KeyedTable

In [None]:
kx.Table([[1, 2, 'a'], [2, 3, 'b'], [3, 4, 'c']],
         columns = ['col1', 'col2', 'col3'])

In [None]:
kx.Table(data = {
         'col1': [1, 2, 3],
         'col2': [2 , 3, 4],
         'col3': ['a', 'b', 'c']})

##### `pykx.KeyedTable`

[pykx.KeyedTable](../api/pykx-q-data/wrappers.html#pykx.wrappers.KeyedTable)


In [None]:
kx.Table(data = {'x': [1, 2, 3], 'x1': [2, 3, 4], 'x2': ['a', 'b', 'c']}
         ).set_index(['x'])

### 2.6 Other data types

Below we outlined some of the important PyKX data type structures that you will run into through the rest of this notebook.

####Â pykx.Lambda

A `pykx.Lambda` is the most basic kind of function within PyKX. They take between 0 and 8 parameters and are the building blocks for most analytics written by users when interacting with data from PyKX.

In [None]:
pykx_lambda = kx.q('{x+y}')
type(pykx_lambda)

In [None]:
pykx_lambda(1, 2)

#### pykx.Projection

Like [functools.partial](https://docs.python.org/3/library/functools.html#functools.partial), functions in PyKX can have some of their parameters set in advance, resulting in a new function, which is called a projection. When you call this projection, the set parameters are no longer required and cannot be provided.

If the original function had `n` total parameters and `m` provided, the result would be a function (projection) that requires the user to input `n-m` parameters.

In [None]:
projection = kx.q('{x+y}')(1)
projection

In [None]:
projection(2)

## 3. Access and create PyKX objects

Now that you're familiar with the PyKX object types, let's see how they work in real-world scenarios, such as:

- 3.1 [Create PyKX objects from Pythonic data types](#31-create-pykx-objects-from-pythonic-data-types)
- 3.2 [Random data generation](#32-random-data-generation)
- 3.3 [Run q code to generate data](#33-run-q-code-to-generate-data)
- 3.4 [Read data from a CSV file](#34-read-data-from-a-csv-file)
- 3.5 [Query external processes via IPC](#35-query-external-processes-via-ipc)

### 3.1 Create PyKX objects from Pythonic data types

One of the most common ways to generate PyKX data is by converting equivalent Pythonic data types. PyKX natively supports conversions to and from the following common Python data formats:

- Python
- Numpy
- Pandas
- PyArrow

You can generate PyKX objects by using the `kx.toq` PyKX function:

In [None]:
pydict = {'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': 2}
kx.toq(pydict)

In [None]:
nparray = np.array([1, 2, 3, 4], dtype = np.int32)
kx.toq(nparray)

In [None]:
pdframe = pd.DataFrame(data = {'a':[1, 2, 3], 'b': ['a', 'b', 'c']})
kx.toq(pdframe)

### 3.2 Random data generation

PyKX provides a module to create random data of user-specified PyKX types or their equivalent Python types. The creation of random data helps in prototyping analytics.

As a first example, generate a list of 1,000,000 random floating-point values between 0 and 1 as follows:

In [None]:
kx.random.random(1000000, 1.0)

If you wish to choose values randomly from a list, use the list as the second argument to your function:

In [None]:
kx.random.random(5, [kx.LongAtom(1), ['a', 'b', 'c'], np.array([1.1, 1.2, 1.3])])

Random data does not only come in 1-Dimensional forms. To create multi-Dimensional PyKX Lists, turn the first argument into a list. The following examples include a PyKX trick that uses nulls/infinities to generate random data across the full allowable range:

In [None]:
kx.random.random([2, 5], kx.GUIDAtom.null)

In [None]:
kx.random.random([2, 3, 4], kx.IntAtom.inf)

Finally, to have consistency over the generated objects, set the seed for the random data generation explicitly. You can complete this globally or for individual function calls:

In [None]:
kx.random.seed(10)
kx.random.random(10, 2.0)

In [None]:
kx.random.random(10, 2.0, seed = 10)

### 3.3 Run q code to generate data

PyKX is an entry point to the vector programming language q. This means that PyKX users can execute q code directly via PyKX within a Python session, by calling `kx.q`.

For example, to create q data, run the following command:

In [None]:
kx.q('0 1 2 3 4')

In [None]:
kx.q('([idx:desc til 5]col1:til 5;col2:5?1f;col3:5?`2)')

Next, apply arguments to a user-specified function `x+y`:

In [None]:
kx.q('{x+y}', kx.LongAtom(1), kx.LongAtom(2))

### 3.4 Read data from a CSV file

A lot of data that you run into for data analysis tasks comes in the form of CSV files. PyKX, like Pandas, provides a CSV reader called via `kx.q.read.csv`. In the next cell we create a CSV that can be read in PyKX:

In [None]:
import csv

with open('pykx.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    field = ["name", "age", "height", "country"]
    
    writer.writerow(field)
    writer.writerow(["Oladele Damilola", "40", "180.0", "Nigeria"])
    writer.writerow(["Alina Hricko", "23", "179.2", "Ukraine"])
    writer.writerow(["Isabel Walter", "50", "179.5", "United Kingdom"])

In [None]:
kx.q.read.csv('pykx.csv', types = {'age': kx.LongAtom, 'country': kx.SymbolAtom})

In [None]:
import os
os.remove('pykx.csv')

### 3.5 Query external processes via IPC

One of the most common usage patterns in organizations with access to data in kdb+/q is to query data from an external server process infrastructure. For the example below you need to [install q](https://kx.com/kdb-insights-personal-edition-license-download/).

First, set up a q/kdb+ server. Set it on port 5050 and populate it with some data in the form of a table `tab`:

In [None]:
import subprocess
import time

try:
    with kx.PyKXReimport():
        proc = subprocess.Popen(
            ('q', '-p', '5000')
        )
    time.sleep(2)
except:
    raise kx.QError('Unable to create q process on port 5000')

Once a q process is available, connect to it for synchronous query execution:

In [None]:
conn = kx.SyncQConnection(port = 5000)

You can now run q commands against the q server:

In [None]:
conn('tab:([]col1:100?`a`b`c;col2:100?1f;col3:100?0Ng)')
conn('select from tab where col1=`a')

Alternatively, use the PyKX query API:

In [None]:
conn.qsql.select('tab', where=['col1=`a', 'col2<0.3'])

Or use PyKX's context interface to run SQL server side if you have access to it:

In [None]:
conn('\l s.k_')
conn.sql('SELECT * FROM tab where col2>=0.5')

Finally, shut down the q server used for this demonstration:

In [None]:
proc.kill()

---

## 4. Run analytics on PyKX objects

Like many Python libraries (including Numpy and Pandas), PyKX provides many ways to use its data with analytics that you generated and defined within the library. Let's explore the following:

- 4.1 [Use in-built methods on PyKX Vectors](#41-use-in-built-methods-on-pykx-vectors)
- 4.2 [Use in-built methods on PyKX Tables](#42-use-in-built-methods-on-pykx-tables)
- 4.3 [Use PyKX/q native functions](#43-use-pykxq-native-functions)


### 4.1 Use in-built methods on PyKX Vectors

When you interact with PyKX Vectors, you may wish to gain insights into these objects through the application of basic analytics such as calculation of the `mean`/`median`/`mode` of the vector:

In [None]:
q_vector = kx.random.random(1000, 10.0)

In [None]:
q_vector.mean()

In [None]:
q_vector.max()

The above is useful for basic analysis. For bespoke analytics on these vectors, use the `apply` method:

In [None]:
def bespoke_function(x, y):
    return x*y

q_vector.apply(bespoke_function, 5)

### 4.2 Use in-built methods on PyKX Tables

In addition to the vector processing capabilities of PyKX, it's important to have the ability to manage tables. Highlighted in depth within the Pandas-Like API documentation [here](../user-guide/advanced/Pandas_API.ipynb), these methods allow you to apply functions and gain insights into your data in a familiar way.

The example below uses combinations of the most used elements of this Table API operating on the following table:

In [None]:
N = 1000000
example_table = kx.Table(data = {
    'sym' : kx.random.random(N, ['a', 'b', 'c']),
    'col1' : kx.random.random(N, 10.0),
    'col2' : kx.random.random(N, 20)
    }
)
example_table

You can search for and filter data within your tables using `loc` similarly to how this is achieved by Pandas:

In [None]:
example_table.loc[example_table['sym'] == 'a']

This also happens when retrieving data from a table through the `__get__` method:

In [None]:
example_table[example_table['sym'] == 'b']

Next, you can set the index columns of a table. In PyKX, this means converting the table from a `pykx.Table` object to a `pykx.KeyedTable` object:

In [None]:
example_table.set_index('sym')

Or you can apply basic data manipulation operations such as `mean` and `median`:

In [None]:
print('mean:')
display(example_table.mean(numeric_only = True))

print('median:')
display(example_table.median(numeric_only = True))

Next, use the `groupby` method to group PyKX tabular data so you can use it for analytic purposes.

In the first example, let's start by grouping the dataset based on the `sym` column and calculate the `mean` for each column based on their `sym`:

In [None]:
example_table.groupby('sym').mean()

To extend the above `groupby`, consider a more complex example which uses `numpy` to run calculations on the PyKX data. You will notice later that you can simplify this specific use-case further.

In [None]:
def apply_func(x):
    nparray = x.np()
    return np.sqrt(nparray).mean()

example_table.groupby('sym').apply(apply_func)

For time-series specific joining of data, use `merge_asof` joins. In this example, you have several tables with temporal information namely a `trades` and `quotes` table:

In [None]:
trades = kx.Table(data={
    "time": [
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.023"),
        pd.Timestamp("2016-05-25 13:30:00.030"),
        pd.Timestamp("2016-05-25 13:30:00.041"),
        pd.Timestamp("2016-05-25 13:30:00.048"),
        pd.Timestamp("2016-05-25 13:30:00.049"),
        pd.Timestamp("2016-05-25 13:30:00.072"),
        pd.Timestamp("2016-05-25 13:30:00.075")
    ],
    "ticker": [
       "GOOG",
       "MSFT",
       "MSFT",
       "MSFT",
       "GOOG",
       "AAPL",
       "GOOG",
       "MSFT"
   ],
   "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
   "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
quotes = kx.Table(data={
   "time": [
       pd.Timestamp("2016-05-25 13:30:00.023"),
       pd.Timestamp("2016-05-25 13:30:00.038"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048"),
       pd.Timestamp("2016-05-25 13:30:00.048")
   ],
   "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
   "price": [51.95, 51.95, 720.77, 720.92, 98.0],
   "quantity": [75, 155, 100, 100, 100]
})

print('trades:')
display(trades)
print('quotes:')
display(quotes)

When applying the `asof` join, you can additionally use named arguments to make a distinction between the tables that the columns originate from. In this case, suffix with `_trades` and `_quotes`:

In [None]:
trades.merge_asof(quotes, on='time', suffixes=('_trades', '_quotes'))

### 4.3 Use PyKX/q native functions

While the Pandas-like API and methods provided off PyKX Vectors provides an effective method of applying analytics on PyKX data, the most efficient and performant way to run analytics on your data is by using PyKX/q primitives available through the `kx.q` module.

These include functionality for calculating moving averages, asof/window joins, column reversal etc. Now let's see a few examples with how you can use these functions, grouped into the following sections:

- 4.3.1 [Mathematical functions](#431-mathematical-functions)
- 4.3.2 [Iteration functions](#432-iteration-functions)
- 4.3.3 [Table functions](#433-table-functions)

#### 4.3.1 Mathematical functions

##### mavg

Calculate a series of average values across a list using a rolling window:

In [None]:
kx.q.mavg(10, kx.random.random(10000, 2.0))

##### cor

Calculate the correlation between two lists:

In [None]:
kx.q.cor([1, 2, 3], [2, 3, 4])

In [None]:
kx.q.cor(kx.random.random(100, 1.0), kx.random.random(100, 1.0))

##### prds

Calculate the cumulative product across a supplied list:

In [None]:
kx.q.prds([1, 2, 3, 4, 5])

#### 4.3.2 Iteration functions

##### each

Supplied both as a standalone primitive and as a method for PyKX Lambdas `each` allows you to pass individual elements of a PyKX object to a function:

In [None]:
kx.q.each(kx.q('{prd x}'), kx.random.random([5, 5], 10.0, seed=10))

In [None]:
kx.q('{prd x}').each(kx.random.random([5, 5], 10.0, seed=10))

#### 4.3.3 Table functions

##### meta

Retrieve metadata information about a table:

In [None]:
qtab = kx.Table(data = {
    'x' : kx.random.random(1000, ['a', 'b', 'c']).grouped(),
    'y' : kx.random.random(1000, 1.0),
    'z' : kx.random.random(1000, kx.TimestampAtom.inf)
})

In [None]:
kx.q.meta(qtab)

##### xasc

Sort the contents of a specified column in ascending order:

In [None]:
kx.q.xasc('z', qtab)

You can find the full list of the functions and some examples of their usage [here](../api/pykx-execution/q.md).

