{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PyKX Introduction Notebook\n",
    "\n",
    "The purpose of this notebook is to provide an introduction to the capabilities and functionality made available to you with PyKX.\n",
    "\n",
    "To follow along please download this notebook using the following <a href=\"./PyKX%20Introduction%20Notebook.ipynb\" download>'link.'</a>\n",
    "\n",
    "This Notebook is broken into the following sections\n",
    "\n",
    "1. [How to import PyKX](#How-to-import-Pykx)\n",
    "1. [The basic data structures of PyKX](#The-basic-data-structures-of-PyKX)\n",
    "1. [Accessing and creating PyKX objects](#Accessing-and-creating-PyKX-objects)\n",
    "1. [Running analytics on objects in PyKX](#Running-analytics-on-objects-in-PyKX)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Welcome to PyKX!\n",
    "\n",
    "PyKX is a Python library built and maintained for interfacing seamlessly with the worlds fastest time-series database technology kdb+ and it's underlying vector programming language q.\n",
    "\n",
    "It's aim is to provide you and all Python data-engineers and data-scientist with an interface to efficiently apply analytics on large volumes of on-disk and in-memory data, in a fraction of the time of competitor libraries.\n",
    "\n",
    "## How to import PyKX\n",
    "\n",
    "To access PyKX and it's functions import it in your Python code as follows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide_code"
    ]
   },
   "outputs": [],
   "source": [
    "import os\n",
    "os.environ['IGNORE_QHOME'] = '1' # Ignore symlinking PyKX q libraries to QHOME\n",
    "os.environ['PYKX_Q_LOADED_MARKER'] = '' # Only used here for running Notebook under mkdocs-jupyter during document generation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pykx as kx\n",
    "kx.q.system.console_size = [10, 80]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The shortening of the import name to `kx` is done for readability of code that uses PyKX and is the intended standard for the library. As such we recommend that you always use `import pykx as kx` when using the library.\n",
    "\n",
    "Below we load additional libraries used through this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The basic data structures of PyKX\n",
    "\n",
    "Central to your interaction with PyKX are the various data types that are supported by the library, fundamentally PyKX is built atop a fully featured functional programming language `q` which provides small footprint data structures that can be used in analytic calculations and the creation of highly performant databases. The types we show below are generated from Python equivalent types but as you will see through this notebook \n",
    "\n",
    "In this section we will describe the basic elements which you will come in contact with as you traverse the library and explain why/how they are different.\n",
    "\n",
    "### PyKX Atomic Types\n",
    "\n",
    "In PyKX an atom denotes a single irreducible value of a specific type, for example you may come across `pykx.FloatAtom` or `pykx.DateAtom` objects generated as follows which may have been generated as follows from an equivalent Pythonic representation. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.FloatAtom(1.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import date\n",
    "kx.DateAtom(date(2020, 1, 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PyKX Vector Types\n",
    "\n",
    "Similar to atoms, vectors are a data structure composed of a collection of multiple elements of a single specified type. These objects in PyKX along with lists described below form the basis for the majority of the other important data structures that you will encounter including dictionaries and tables.\n",
    "\n",
    "Typed vector objects provide significant benefits when it comes to the applications of analytics over Python lists for example. Similar to Numpy, PyKX gains from the underlying speed of it's analytic engine when operating on these strictly typed objects.\n",
    "\n",
    "Vector type objects are always 1-D and as such are/can be indexed along a single axis.\n",
    "\n",
    "In the following example we are creating PyKX vectors from common Python equivalent `numpy` and `pandas` objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.IntVector(np.array([1, 2, 3, 4], dtype=np.int32))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.toq(pd.Series([1, 2, 3, 4]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PyKX Lists\n",
    "\n",
    "A `List` in PyKX can loosely be described as an untyped vector object. Unlike vectors which are optimised for the performance of analytics, lists are more commonly used for storing reference information or matrix data.\n",
    "\n",
    "Unlike vector objects which are by definition 1-D in shape, lists can be ragged N-Dimensional objects. This makes them useful for the storage of some complex data structures but limits their performance when dealing with data-access/data modification tasks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.List([[1, 2, 3], [1.0, 1.1, 1.2], ['a', 'b', 'c']])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PyKX Dictionaries\n",
    "\n",
    "A dictionary in PyKX is defined as a mapping between a direct key-value mapping, the list of keys and values to which they are associated must have the same count. While it can be considered as a key-value pair, it is physically stored as a pair of lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(kx.Dictionary({'x': [1, 2, 3], 'x1': np.array([1, 2, 3])}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PyKX Tables\n",
    "\n",
    "Tables in PyKX are a first-class typed entity which live in memory. They can be fundamentally described as a collection of named columns implemented as a dictionary. This mapping construct means that tables in PyKX are column-oriented which makes analytic operations on specified columns much faster than would be the case for a relational database equivalent.\n",
    "\n",
    "Tables in PyKX come in many forms but the key table types are as follows\n",
    "\n",
    "- `pykx.Table` \n",
    "- `pykx.KeyedTable`\n",
    "- `pykx.SplayedTable`\n",
    "- `pykx.PartitionedTable`\n",
    "\n",
    "In this section we will deal only with the first two of these which constitute specifically the in-memory data table types. As will be discussed in later sections `Splayed` and `Partitioned` tables are memory-mapped on-disk data structures, these are derivations of the `pykx.Table` and `pykx.KeyedTable` type objects.\n",
    "\n",
    "#### `pykx.Table`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(kx.Table([[1, 2, 'a'], [2, 3, 'b'], [3, 4, 'c']], columns = ['col1', 'col2', 'col3']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(kx.Table(data = {'col1': [1, 2, 3], 'col2': [2 , 3, 4], 'col3': ['a', 'b', 'c']}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `pykx.KeyedTable`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.Table(data = {'x': [1, 2, 3], 'x1': [2, 3, 4], 'x2': ['a', 'b', 'c']}).set_index(['x'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Other Data Types\n",
    "\n",
    "The above types outline the majority of the important type structures in PyKX but there are many others which you will encounter as you use the library, below we have outlined some of the important ones that you will run into through the rest of this notebook.\n",
    "\n",
    "#### `pykx.Lambda`\n",
    "\n",
    "A `pykx.Lambda` is the most basic kind of function within PyKX. They take between 0 and 8 parameters and are the building blocks for most analytics written by users when interacting with data from PyKX."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pykx_lambda = kx.q('{x+y}')\n",
    "type(pykx_lambda)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pykx_lambda(1, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `pykx.Projection`\n",
    "\n",
    "Similar to [functools.partial](https://docs.python.org/3/library/functools.html#functools.partial), functions in PyKX can have some of their parameters fixed in advance, resulting in a new function, which is called a projection. When this projection is called, the fixed parameters are no longer required, and cannot be provided.\n",
    "\n",
    "If the original function had `n` total parameters, and it had `m` provided, the result would be a function (projection) that requires a user to input `n-m` parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "projection = kx.q('{x+y}')(1)\n",
    "projection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "projection(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Accessing and creating PyKX objects\n",
    "\n",
    "Now that we have seen some of the PyKX object types that you will encounter, practically speaking how will they be created in real-world scenarios?\n",
    "\n",
    "### Creating PyKX objects from Pythonic data types\n",
    "\n",
    "One of the most common ways that PyKX data is generated is through conversions from equivalent Pythonic data types. PyKX natively supports conversions to and from the following common Python data formats.\n",
    "\n",
    "- Python\n",
    "- Numpy\n",
    "- Pandas\n",
    "- PyArrow\n",
    "\n",
    "In each of the above cases generation of PyKX objects is facilitated through the use of the `kx.toq` PyKX function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pydict = {'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': 2}\n",
    "kx.toq(pydict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nparray = np.array([1, 2, 3, 4], dtype = np.int32)\n",
    "kx.toq(nparray)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pdframe = pd.DataFrame(data = {'a':[1, 2, 3], 'b': ['a', 'b', 'c']})\n",
    "kx.toq(pdframe)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Random data generation\n",
    "\n",
    "PyKX provides users with a module for the creation of random data of user specified PyKX types or their equivalent Python types. The creation of random data is useful in prototyping analytics and is used extensively within our documentation when creating test examples.\n",
    "\n",
    "As a first example you can generate a list of 1,000,000 random floating point values between 0 and 1 as follows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.random.random(1000000, 1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If instead you wish to choose values randomly from a list, this can be facilitated by using the list as the second argument to your function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.random.random(5, [kx.LongAtom(1), ['a', 'b', 'c'], np.array([1.1, 1.2, 1.3])])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Random data does not only come in 1-Dimensional forms however and modifications to the first argument to be a list allow you to create multi-Dimensional PyKX Lists. The below examples are additionally using a PyKX trick where nulls/infinities can be used to generate random data across the full allowable range"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.random.random([2, 5], kx.GUIDAtom.null)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.random.random([2, 3, 4], kx.IntAtom.inf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, users can set the seed for the random data generation explicitly allowing users to have consistency over the generated objects. This can be completed globally or for individual function calls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.random.seed(10)\n",
    "kx.random.random(10, 2.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.random.random(10, 2.0, seed = 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running q code to generate data\n",
    "\n",
    "As mentioned in the introduction PyKX provides an entrypoint to the vector programming language q, as such users of PyKX can execute q code directly via PyKX within a Python session. This is facilitated through use of calls to `kx.q`.\n",
    "\n",
    "Create some q data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q('0 1 2 3 4')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q('([idx:desc til 5]col1:til 5;col2:5?1f;col3:5?`2)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Apply arguments to a user specified function `x+y`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q('{x+y}', kx.LongAtom(1), kx.LongAtom(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read data from a CSV file\n",
    "\n",
    "A lot of data that you run into for data analysis tasks comes in the form of CSV files, PyKX similar to Pandas provides a CSV reader called via `kx.q.read.csv`, in the following cell we will create a CSV to be read in using PyKX"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "with open('pykx.csv', 'w', newline='') as file:\n",
    "    writer = csv.writer(file)\n",
    "    field = [\"name\", \"age\", \"height\", \"country\"]\n",
    "    \n",
    "    writer.writerow(field)\n",
    "    writer.writerow([\"Oladele Damilola\", \"40\", \"180.0\", \"Nigeria\"])\n",
    "    writer.writerow([\"Alina Hricko\", \"23\", \"179.2\", \"Ukraine\"])\n",
    "    writer.writerow([\"Isabel Walter\", \"50\", \"179.5\", \"United Kingdom\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q.read.csv('pykx.csv', types = {'age': kx.LongAtom, 'country': kx.SymbolAtom})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.remove('pykx.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Querying external Processes via IPC\n",
    "\n",
    "One of the most common usage patterns in organisations with access to data in kdb+/q you will encounter is to query this data from an external server process infrastructure. In the example below we assume that you have q installed in addition to PyKX, see [here](https://kx.com/kdb-insights-personal-edition-license-download/) to install q alongside the license access for PyKX.\n",
    "\n",
    "First we set up a q/kdb+ server setting it on port 5050 and populating it with some data in the form of a table `tab`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import subprocess\n",
    "import time\n",
    "\n",
    "try:\n",
    "    with kx.PyKXReimport():\n",
    "        proc = subprocess.Popen(\n",
    "            ('q', '-p', '5000')\n",
    "        )\n",
    "    time.sleep(2)\n",
    "except:\n",
    "    raise kx.QError('Unable to create q process on port 5000')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once a q process is available you can establish a connection to it for synchronous query execution as follows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conn = kx.SyncQConnection(port = 5000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can now run q commands against the q server"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conn('tab:([]col1:100?`a`b`c;col2:100?1f;col3:100?0Ng)')\n",
    "conn('select from tab where col1=`a')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or use the PyKX query API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.qsql.select('tab', where=['col1=`a', 'col2<0.3'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or use PyKX's context interface to run SQL server side if it's available to you"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conn('\\l s.k_')\n",
    "conn.sql('SELECT * FROM tab where col2>=0.5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally the q server used for this demonstration can be shut down"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "proc.kill()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Running analytics on objects in PyKX\n",
    "\n",
    "Like many Python libraries including Numpy and Pandas PyKX provides a number of ways that it's data can be used with analytics defined internal to the library and which you have self generated.\n",
    "\n",
    "### Using in-built methods on PyKX Vectors\n",
    "\n",
    "When you are interacting with PyKX Vectors you may wish to gain insights into these objects through the application of basic analytics such as calculation of the `mean`/`median`/`mode` of the vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "q_vector = kx.random.random(1000, 10.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "q_vector.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "q_vector.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above is useful for basic analysis but will not be sufficient for more bespoke analytics on these vectors, to allow you more control over the analytics run you can also use the `apply` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def bespoke_function(x, y):\n",
    "    return x*y\n",
    "\n",
    "q_vector.apply(bespoke_function, 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using in-built methods on PyKX Tables\n",
    "\n",
    "In addition to the vector processing capabilities of PyKX your ability to operate on Tabular structures is also important. Highlighted in greater depth within the Pandas-Like API documentation [here](../user-guide/advanced/Pandas_API.ipynb) these methods allow you to apply functions and gain insights into your data in a way that is familiar.\n",
    "\n",
    "In the below example you will use combinations of the most commonly used elements of this Table API operating on the following table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 1000000\n",
    "example_table = kx.Table(data = {\n",
    "    'sym' : kx.random.random(N, ['a', 'b', 'c']),\n",
    "    'col1' : kx.random.random(N, 10.0),\n",
    "    'col2' : kx.random.random(N, 20)\n",
    "    }\n",
    ")\n",
    "example_table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can search for and filter data within your tables using `loc` similarly to how this is achieved by Pandas as follows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "example_table.loc[example_table['sym'] == 'a']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This behavior also is incorporated when retrieving data from a table through the `__get__` method as you can see here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "example_table[example_table['sym'] == 'b']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can additionally set the index columns of the table, when dealing with PyKX tables this converts the table from a `pykx.Table` object to a `pykx.KeyedTable` object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "example_table.set_index('sym')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additional to basic data manipulation such as index setting you also get access to analytic capabilities such as the application of basic data manipulation operations such as `mean` and `median` as demonstrated here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('mean:')\n",
    "print(example_table.mean(numeric_only = True))\n",
    "\n",
    "print('median:')\n",
    "print(example_table.median(numeric_only = True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can make use of the `groupby` method which groups the PyKX tabular data which can then be used for analytic application.\n",
    "\n",
    "In your first example let's start by grouping the dataset based on the `sym` column and then calculating the `mean` for each column based on their `sym`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "example_table.groupby('sym').mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As an extension to the above groupby you can now consider a more complex example which is making use of `numpy` to run some calculations on the PyKX data, you will see later that this can be simplified further in this specific use-case"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def apply_func(x):\n",
    "    nparray = x.np()\n",
    "    return np.sqrt(nparray).mean()\n",
    "\n",
    "example_table.groupby('sym').apply(apply_func)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Time-series specific joining of data can be completed using `merge_asof` joins. In this example a number of tables with temporal information namely a `trades` and `quotes` table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trades = kx.Table(data={\n",
    "    \"time\": [\n",
    "        pd.Timestamp(\"2016-05-25 13:30:00.023\"),\n",
    "        pd.Timestamp(\"2016-05-25 13:30:00.023\"),\n",
    "        pd.Timestamp(\"2016-05-25 13:30:00.030\"),\n",
    "        pd.Timestamp(\"2016-05-25 13:30:00.041\"),\n",
    "        pd.Timestamp(\"2016-05-25 13:30:00.048\"),\n",
    "        pd.Timestamp(\"2016-05-25 13:30:00.049\"),\n",
    "        pd.Timestamp(\"2016-05-25 13:30:00.072\"),\n",
    "        pd.Timestamp(\"2016-05-25 13:30:00.075\")\n",
    "    ],\n",
    "    \"ticker\": [\n",
    "       \"GOOG\",\n",
    "       \"MSFT\",\n",
    "       \"MSFT\",\n",
    "       \"MSFT\",\n",
    "       \"GOOG\",\n",
    "       \"AAPL\",\n",
    "       \"GOOG\",\n",
    "       \"MSFT\"\n",
    "   ],\n",
    "   \"bid\": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],\n",
    "   \"ask\": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]\n",
    "})\n",
    "quotes = kx.Table(data={\n",
    "   \"time\": [\n",
    "       pd.Timestamp(\"2016-05-25 13:30:00.023\"),\n",
    "       pd.Timestamp(\"2016-05-25 13:30:00.038\"),\n",
    "       pd.Timestamp(\"2016-05-25 13:30:00.048\"),\n",
    "       pd.Timestamp(\"2016-05-25 13:30:00.048\"),\n",
    "       pd.Timestamp(\"2016-05-25 13:30:00.048\")\n",
    "   ],\n",
    "   \"ticker\": [\"MSFT\", \"MSFT\", \"GOOG\", \"GOOG\", \"AAPL\"],\n",
    "   \"price\": [51.95, 51.95, 720.77, 720.92, 98.0],\n",
    "   \"quantity\": [75, 155, 100, 100, 100]\n",
    "})\n",
    "\n",
    "print('trades:')\n",
    "display(trades)\n",
    "print('quotes:')\n",
    "display(quotes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When applying the asof join you can additionally used named arguments to ensure that it is possible to make a distinction between the tables that the columns originate. In this case suffixing with `_trades` and `_quotes`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trades.merge_asof(quotes, on='time', suffixes=('_trades', '_quotes'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using PyKX/q native functions\n",
    "\n",
    "While use of the Pandas-Like API and methods provided off PyKX Vectors provides an effective method of applying analytics on PyKX data the most efficient and performant way you can run analytics on your data is through the use of the PyKX/q primitives which are available through the `kx.q` module.\n",
    "\n",
    "These include functionality for the calculation of moving averages, application of asof/window joins, column reversal etc. A full list of the available functions and some examples of their usage can be found [here](../api/pykx-execution/q.md).\n",
    "\n",
    "Here are a few examples of usage of how you can use these functions, broken into sections for convenience\n",
    "\n",
    "#### Mathematical functions\n",
    "\n",
    "##### mavg\n",
    "\n",
    "Calculate a series of average values across a list using a rolling window"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q.mavg(10, kx.random.random(10000, 2.0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### cor\n",
    "\n",
    "Calculate the correlation between two lists"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q.cor([1, 2, 3], [2, 3, 4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q.cor(kx.random.random(100, 1.0), kx.random.random(100, 1.0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### prds\n",
    "\n",
    "Calculate the cumulative product across a supplied list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q.prds([1, 2, 3, 4, 5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Iteration functions\n",
    "\n",
    "##### each\n",
    "\n",
    "Supplied both as a standalone primitive and as a method for PyKX Lambdas `each` allows you to pass individual elements of a PyKX object to a function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q.each(kx.q('{prd x}'), kx.random.random([5, 5], 10.0, seed=10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q('{prd x}').each(kx.random.random([5, 5], 10.0, seed=10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Table functions\n",
    "\n",
    "##### meta\n",
    "\n",
    "Retrieval of metadata information about a table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "qtab = kx.Table(data = {\n",
    "    'x' : kx.random.random(1000, ['a', 'b', 'c']).grouped(),\n",
    "    'y' : kx.random.random(1000, 1.0),\n",
    "    'z' : kx.random.random(1000, kx.TimestampAtom.inf)\n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q.meta(qtab)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### xasc\n",
    "\n",
    "Sort the contents of a specified column in ascending order"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kx.q.xasc('z', qtab)"
   ]
  }
 ],
 "metadata": {
  "file_extension": ".py()",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
  "pygments_lexer": "ipython3",
  "version": 3
 },
 "nbformat": 4,
 "nbformat_minor": 4
}